Git for non-hackers pt. 1: Organizing your research one commit at a time // Cogsci

Version control is all the rage in academia. And when people talk about version control, they generally mean Git, which is by far the most popular version-control system. But what exactly is Git? We all want to control our versions. Especially when you have experienced versions-run-amock situations like these:

document-v1-latest_(commented)-trackchanges_3.2-wed_12:00.docx

But how?!

In very simple terms, Git is a program that allows you to take snapshots of your files at a particular moment. A snapshot is called a ‘commit’. A ‘repository’ is a collection of files that are monitored by Git. If you are familiar with DropBox, there is an obvious parallel: Your DropBox folder is your repository, and DropBox automatically ‘commits’ each and every change. But Git is far more flexible and controlled.

Git has been developed by Linus Torvalds to manage the development of the Linux kernel. Managing a project as large as the Linux kernel is very complicated, and Git has lots of advanced functionality that allows people to work in parallel on the same project, without things drifting hopelessly apart. Therefore, git can be a tool for hardcore nerds. But it doesn’t need to be. Git is equally suitable for managing a simple, one-man project. And in this case, Git is very simple to use.

For this tutorial, I will assume that you are running Windows 7, because most of my readers do. But with minor variations, everything is applicable to other platforms as well.

So let’s begin! You can download git for free from:

http://www.git-scm.com/

During the install process, you can simply answer ‘Next’ and ‘Ok’ to the various questions that pop up. The defaults are fine. Once Git has been installed, create an empty folder somewhere and call it my-git-project. If you right-click on the newly created folder, you will see that the context menu has three Git-related options (if you don’t see them, log out and in first):

To turn my-git-project into a git repository, simply select ‘Git Init Here’. If you now open the folder, you will see a subfolder called .git. This is where Git stores all the information about your repository. You never need to bother with this folder directly.

Now let’s create an empty text document (in my-git-project), called readme.txt. Right-click on readme.txt and select ‘Git Gui’. This will start a very simple Git user interface. Believe it or not, you will grow to love this ugly duckling!

The Git GUI is divided into four parts. On the top-left, you have the ‘Unstaged Changes’, which is a list of all files that have been created, deleted, or modified since the last commit. In this case, it only contains readme.txt. If you click on the icon next to readme.txt, the file will move to the ‘Staged Changes’. The ‘Staged Changes’ is a list of files that have been altered and that you want to take a snapshot of. Or, in Git terminology, a list of changes that you want to commit. At the bottom-right, there is a text-box that allows you to provide a description of your commit. It’s good practice to write a short but informative commit message, so that you can identify and understand your commit later on. Type something like ‘Add readme.txt’ and click on the ‘Commit’ button.

Congratulations, you have made your first commit! Whatever happens now, you will always be able to go back to the situation in which your project consisted of only an empty readme.txt.

Now open readme.txt in a text editor and type the following line of text: “These are some changes that I would like to commit.” Save readme.txt, switch back to Git GUI, and click on ‘Rescan’. Git will detect that readme.txt has been changed, and even provides a preview of the change in the yellow area on the top-right. (If you work with non-plain-text files, such as .doc, you will not see a preview, and Git will assume that the entire file has changed.) Click on the icon next to readme.txt to ‘stage’ the file, type a commit message, and click on the ‘Commit’ button.

Bam! Your second commit.

Now, let’s say that we’re unhappy with the changes that we’ve just made to readme.txt, and that we want to undo them. The power of Git is (among other things) that you can jump back and forth between different snapshots of your project. Let’s take a look at how this works.

Right-click again on readme.txt (in the Windows explorer, not Git GUI) and select ‘Git History’. This will open a somewhat complicated-looking window that describes the commit history of your project. The top row contains a list of all commits, with the commit message in the first column, the person who made the commit in the second, and the time of the commit in the third row. Now, right-click on the commit labeled ‘Add readme.txt’ (i.e. our first commit) and select ‘Reset master branch to here’. You will be asked whether you want a ‘hard’, ‘soft’, or ‘mixed’ reset. For now, suffice it to say that a ‘hard’ reset really takes you back to the situation of the commit. So that’s what we will select here.

However, and this is a very important point, if you do a hard reset you will lose all work in your repository that has not been committed. In other words, before you do a hard reset, make sure that you have committed all changes that you want to preserve. If there are any changes listed under ‘Unstaged Changes’ in the Git GUI (click on ‘Rescan’ to be sure), they will be irretrievably lost after a hard reset.

With that important disclaimer out of the way, in this case we do select the hard reset. You will now see that readme.txt is empty again, just like it was before we did our second commit. We travelled back in time! Now right-click on the commit labelled ‘Some changes to readme.txt’ and do another hard reset. You will now see that readme.txt contains text again. We have travelled back to the future! (I wrote this only so that I can include the image below. It’s of course more accurate to say that we have travelled back to a point in history that is closer to the present. Git does not yet support traveling to the future.)

Well, that’s it for now! A very very basic introduction to Git. We have not even begun to explore what Git can do. We have not mentioned syncing your repository with a server such as GitHub. We have not mentioned how you can collaborate with others. We have not mentioned how you can work on different versions of the same project in parallel. And the simple workflow that I describe here is not necessarily the only or best one. (Some may frown on the potentially dangerous use of a hard reset.)

But for managing document versions in a one-man project, this is pretty much all you need. And once you know the basics, you can gradually extend your knowledge of Git to deal with more complex situations.