CodeTriangle's Curiosities

Subversion Deserves a Passing Thought Every Once in a While

  1. The One Big Difference
    1. Distributed vs. Centralized
    2. SVN is an Always-Online Game
    3. The Downside of Distributed
    4. My Lukewarm Take
  2. Other Differences
    1. The Copy Operation
    2. Branching and Tagging
    3. Command Structure
  3. Closing Thoughts

Recently, for work, I've been learning Subversion (or, for brevity, SVN). I have some thoughts, which I had been sharing with some friends on Discord. After they expressed some interest in a more complete, more well-thought-out version of my thoughts, I decided to turn my thoughts into this blog's inaugural post.

I, like many people, learned version control using git. My first brush with the concept of a VCS came with my first brush with Java, when, within my first tutorial, I was recommended to use the Eclipse IDE and GitHub.

GitHub, obviously, has only grown in popularity since its inception to the extent that, up until recently, I have never had to use anything other than git for any project I've ever wanted or needed to contribute to. I was aware of SVN but only as the second member on every list of version control systems. I had also at one point been on the hompeage of Fossil for whatever that's worth and had heard whisperings of something called Mercurial.

Recently, however, I've become involved with several projects at work which use SVN instead of git. So I sat myself down at my desk, pulled up the SVN book and prepared myself to be open-minded. I love git, and am a great promoter of its use, but I was willing to entertain the alternate side of the debate.

I am a few hours more experienced with Subversion now after skimming chapter one, studying chapter two and chapter three pretty intently and, a bit later, marathoning chapter four. And I have some thoughts.

Before we continue further, I'd like to lay down some expectations. I expect the audience to have used git before and to be able to understand all of its basic commands, namely, branch, tag, and merge. No knowledge of Subversion is required as this article is meant to educate git users on the differences between the two. In fact, the less you know about Subversion, the smarter I'll look. If I misstate something, maybe consider forgetting a bit about Subversion so that I can keep up the illusion.

The One Big Difference

Let's address the one main point first. Almost all of the differences between svn and git inspire fascinating debates about the design of VCS tools and, even more generally, of computer tools in general. How should various tools be made to fit together? How much structure should a system enforce on its user? These are questions that I have every intent of circling back to, but those differences are eclipsed by the fact that SVN is the popular centralized VCS and git is the popular distributed VCS.

Distributed vs. Centralized

In a distributed model like git's, (which I assume the reader is more familiar with) every user downloads the full history of the repository to their computer, makes changes, commits those changes locally, then pushes those commits to the server.

In a centralized model like Subversion's, the only complete copy of the repository with all the history intact is stored on the server. Each user's working copy tracks exactly one revision at a time. Creating a commit in Subversion updates the one canonical tree on the server to match your working copy. The "commit" and "push" options are therefore one and the same.

SVN is an Always-Online Game

In comparison to a distributed model, a centralized model is much simpler. Having multiple copies of the same tree (all with their own minimal modifications) stored on many different devices, whose changes must later be reconciled in event of any kind of conflict, is in theory much more cluttered and confusing than the inherent simplicity of a single canonical tree. Initial checkouts are additionally significantly faster due to the fact that only one revision needs to be downloaded.

On the other hand, this means creating a commit requires an internet connection. As does interacting with the history of the repository in any form, including the simple act of showing a log. This naturally means that these operations take much longer to perform than the equivalent git commands, scaling with your network speed.

Now, reader, I have Xfinity, so my continuity of service is terrible. I can recall at least one incident every month where I was hacking on a project late at night and my modem would spontaneously turn off. I would pull up my Xfinity app, switch my phone to use mobile data, and confirm that, yep, there's an outage in my area. As all of my current personal projects use git, this has never halted my development. Sure, I won't be able to push, but I have never had to take a break from coding due to an outage.

(Also, Xfinity, I know that these aren't "unexpected outages" like you keep on claiming. If you have to do maintenance on my network every night for three nights at the same time of day, you can just tell me that you're doing maintenance. Also, you have my phone number, so there's no reason not to send me a text when things like this happen.)

This problem already occurred for me before I even made my first SVN commit! I had read the first three chapters of the SVN book, checked out a working copy of the repository, and finally, I was ready to start working on my branch when the SVN server started refusing my connections. I emailed the sysadmin for the SVN server to clarify the issue. Turns out, the server had gone down. It was for this exact reason that I decided to dig into the fourth chapter of the SVN book.

Outages can and will happen. Because of that, it is crucial to plan around them. SVN still has no good answer to these questions.

The Downside of Distributed

I would like to now bring up the best argument for centralized version control. This argument is made rather eloquently in two blog posts (1) (2) by Ben Collins-Sussman, one of the originators of Subversion. These are definitely worth a read. They're short and sweet, but as fascinating as an article with ten times their word count. The basic gist involves two anti-patterns that centralized models encourage but decentralized models have limited protection against: bombing and caving.

Caving or crawling into a cave refers to the act of writing an entire new feature offline and without communicating with the team. Bombing or dropping a code bomb refers to the act of committing such a feature all in one go. When a user drops a code bomb, they deprive other collaborators of the ability to help design the feature, give input on code structure, and point out flaws in the feature's implementation.

When using a centralized VCS, it is very easy to see this behavior as maliciously anti-cooperative. It takes a significant amount of effort to never commit any of your changes until they're all complete. Furthermore, lumping the entire feature together in one commit defeats the purpose of version control in the first place. Simply put, if you're dropping a code bomb in a Subversion repository, you're using Subversion wrong.

git, on the other hand, seems to encourage this behavior by way of its very design. All development can be, and by default is, done without communicating with the project's developers. With Subversion, you must go out of your way to avoid showing your work to other collaborators. With git, secrecy is the standard. GitHub takes this one step further by introducing the idea that you can and should create your own entire fork of the repository so that your code is never seen until you decide to show it off. It is possible to employ good version control practices with a distributed VCS, but it doesn't promote these practices by default.

This argument has grabbed hold of my mind as very few other concepts have in my entire career. This section was meant to be a paragraph expressing my opinion that yes, it's important to actively pursue and gracefully receive code review, but that Subversion's dependence on constant internet access is an intolerable price to pay for this benefit. I still do believe that, but in the process of learning these things I felt my philosophy on open-source contributing gradually shift, hence the much greater focus I have placed on these posts.

My Lukewarm Take

I don't know if it's possible to conclude this section in a satisfying way. If I had to try, I guess I would just say that I still firmly believe that a distributed VCS is better than a centralized one, but that we may find value looking towards the centralized model to provide a good example of how we may reconsider the structure of project contributions.

Other Differences

It really is a shame that git is the popular distributed VCS and SVN is the popular centralized VCS because, beyond the difference in interaction model, there are plenty of fascinating debates to be had regarding the two systems' other differences, but those arguments rarely get the spotlight compared to the question of distributed vs. centralized. I'm clearly not above this (see how much longer the previous section is than any of the ones coming up?) but I did find some elements of the SVN book nothing less than fascinating.

The Copy Operation

One fascinating difference between SVN and git is the list of basic operations that you can perform on files. In git, these are create, modify, and delete. A move is performed by detecting the similarity of an added and a deleted file. Subversion has create, modify, and delete, just like git. But it also has copy. In particular, this preserves the file's revision history for both the original and the copy. A move in Subversion is implemented as running svn copy followed by svn delete.

I actually like how Subversion's move operation is more explicit than git's. git's copy detection can be a bit finnicky at times if the move comes along with additional changes to the file being moved or, occasionally, the changes to other files. Plus, it just relies too much on the machine detecting your intentions, which I tend to distrust.

But that's not the only reason that svn copy is cool. Like I said, SVN's copy operation maintains the history of the object. Under the hood, this works similarly to a UNIX hardlink where the underlying file object is the same, but it is simply accessed from different locations on the filesystem. Unlike a hardlink, though, the original and the copy can be modified separately from one another.

Okay, but how often do you need to clone a file and work on a new copy on the same branch? True, that is a rare case. However, that's not the only use case for svn copy.

Branching and Tagging

In git, branches and tags are features built directly into the core system. You may be surpised to hear that this is not necessarily the case in SVN. Branching and tagging are still supported, but not as separate operations.

In Subversion, the root of the repository is generally not the root of the code. Instead, the main line of development usually lives in a directory called trunk and branches usually exist as copies of trunk placed inside of another root directory called branches. To create a branch called feature-1, the command to run from the root of the repository is:

svn copy \
    https://example.com/svn/project/trunk \
    https://example.com/svn/project/branches/feature-1

This is an almost zero-cost operation on the server side because, remember, svn copy does not create new objects, just new links to old objects. By then using svn switch you can swap your working copy from tracking trunk to branches/feature-1 and commit as normal.

Similarly, tags live as copies of trunk inside of tags.

Crucially, this allows the project administrator much more leeway regarding how to structure the project. For example, the maintainers might decide to arrange branches and tags heirarchically, for instance, by giving each user a directory within branches; by creating different subdirectories for releases, betas, and RCs within tags; or by defying the traditional directory structure entirely. It all just depends on what works best for your project.

In a really neat way, this also allows for user-defined innovations in repository structure, which is not possible when branches and tags are first-class features like in git.

To delete a branch, you can run:

svn delete https://example.com/svn/project/branches/feature-1

And, you can also resurrect a branch by running:

svn copy \
    https://example.com/svn/project/branches/feature-1@4541 \
    https://example.com/svn/project/branches/feature-1

where 4541 is replaced with the revision number of the commit before the item was deleted.

(Oh right, revisions are numbered in sequential order. This does not cause any issues because one server is in control of keeping track of the commits.)

You might find this inelegant, and that is a completely fair opinion, but I can't deny the aesthetic appeal that this has for me.

Command Structure

Notice how versatile the svn copy command has been so far. This exemplifies a trend that I noticed in the way that Subversion designs its tools. There are generally fewer commands that do a variety of related tasks as opposed to git's many commands which each have one or two applications each. Despite this, I find the syntax of SVN commands surprisingly consistent and easy to follow.

I think there is an inherent merit to having a small set of versatile tools. The underlying operations of every use case of svn merge is the same, even though the command is capable of doing vastly different things.

This is where I would insert a rant on the UNIX philosophy if I were a better tech blogger. But, I don't believe I have anything productive to add to that conversation at the moment.

Closing Thoughts

So, as a git user, should you drop everything and switch to Subversion? No, obviously not. git is the standard in version control, for better or for worse. Programmers had a choice between a somewhat inconvenient system in Subversion and a much more convenient system in git. Is it any surprise that we chose convenience, even if the other option encourages better development practices? How very human of us.

If you are starting a new project in 2024 and you choose something other than git for your version control system, you limit your pool of potential contributors. That may not be a bad thing. If you have a project that you plan to work on by yourself or with a small team and you're interested in trying SVN, I say, there's no harm in it. See how you like it! Just, don't expect many more people to flock to you.

I struggled with titling this article. I considered titling it "Subversion deserves a second chance" but, really, it doesn't. Distributed VCS was developed in response to Subversion, which had clear market dominance up until its release. No, Subversion had its time and programmers voted it out. I have no reason to believe that it needs a second term.

However, subversion is fascinating and does not deserve to be forgotten. It is an undeniable piece of this industry's rich history. Additionally, from a design standpoint, there are some fascinating things that we've seemingly just given up with the move to git.

For that reason, Subversion does deserve something in our modern world. Subversion deserves at the very least a passing thought every once in a while.

Tags: #version-control #subversion #git