Reputation: 26723
I see a lot of sites referring to git, github, svn, subversion etc, but I never really knew what all of those things are. I also hear a lot of terms like 'svn repo', 'commit', and 'push' - I tried googling but it seems that I have so little knowledge about the subject that I don't even know where to get started.
Could someone give me the initial push so I can continue doing research on my own? What are these things all about?
Thanks!
guys: thank you so much for all the really long and encompassing explanations. I wish I could choose more than one answer, but unfortunately SO doesn't allow that (they should have a vote 1st, 2nd, and 3rd place feature or something). thank you all very much!
Upvotes: 13
Views: 804
Reputation: 323972
"The Git Parable" by Tom Preston-Warner (mojombo), one of people behind GitHub, describes how version control system, such like Git, might have been made... at the same time describing why one would want and need (distributed) version control system.
See also "A Visual Guide to Version Control" article at Better Explained.
There are many advantages of using version control system. Let's list them roughly in the order of increasing complexity: increasing number of developers, increasing project size / project history size, more complex workflows, etc.
Even if you are single (only) developer of your project, and (at least for the time being) you do not plan to change it, version control system is still useful. It allows to:
Go back to some working version. If you are working on your project, and you realize that you completly screwed up, the approach you tried doesn't work and you don't know how to make it work, it is nice to be able to simply go back to last working version, and start anew.
This means that you should commit, i.e. make snapshot of your changes when you have working version (well, there are exceptions, see below). To avoid losing to much work you should commit fairly often, best (see below) when you completed single feature, single issue, or single part of feature or issue.
You would also want to know what you did, and what you were working on lately. This means that you should describe each changeset (each commit).
Annotate file / browse history. Unless you have perfect memory, sometimes you would want to know why (and when, and in the case when there are multiple developers also who) you wrote given set of lines. Comments are not always enough. For that you can use (if your version control system provides is) line-wise file history annotations (scm annotate
or scm blame
), or other similar tools like so called "pickaxe" search in Git, where you search/browse history for commits that introduced or deleted given string.
For this to be useful you need to write good commit messages, describing the change and the intent of the change, so you would know why the change was made.
Bisect history to find errors. Modern version control systems offer alternative (to inserting print statements or debugger) way of finding bugs... at keast in some cases. When you notice a bug, or get a bugreport, and the bug is not the result of the last change, you can use version control system (csm bisect
) to automatically find commit that introduced the bug (first commit that has given bug). Version control system finds such commit using bisection on project history, retrieving (checking out) versions which you mark as good (without bug) or bad till it finds commits that introduced the bug.
For that you should always ensure that version works (or at least compiles) before committing it, otherwise you won't be ebale to decide if commit has bug or not. You should keep commits small (with not many changes), so when you find commit that introduced bug you would have to check only a amsll number of lines affected by change. You would also need good commit messages, so you would know why the change was made (and decide if the change is correct or not).
Later on you would need another feature of version control system: the ability to work in parallel on different lines of development (flavors) of your project, so called branches. This includes but is not limited to:
Taging releases. When you release new version of your project to a larger public, you would want to tag (mark) released version. This way when somebody tells you that version X.Y of your project has a bug, you would be able to check out this version, and check if you can reproduce this bug (and perhaps find a bug via bisection, see above). This might be of use even if you are not releasing your project, if you use possibly different versions deployed in different places.
For this tags need to be immutable (of course).
Long-lived branches. Let's assume that you released your project, and somebody found a bug. You would probably want to be ebale to put (release) fixed version without stopping work on new features, and without shipping version from development which might be unstable and contain multiple other bugs. Also you would want the bugfix to have also in version that you are working on (if it was not fixed independently).
For this you would use long-lived branches: maintenance branch where you would comit only bugfixes, and development branch (or trunk) where you would do new work, introducing new features etc. There might be more branches with varying stability. For example Git project has four such branches: 'maint' for bugfixes, 'master' for changes that are quite stable, 'next; for development work, and 'pu' or "proposed updates" branch. In other workflows you have separate maintenance (bugfix) branch for each release.
To quote Joel Spolsky: "Keeping stable and dev code separate is precisely what source code control is supposed to let you do."
Topic (feature) branches. When you want to work on multiple issues in parallel, where each feature takes multiple commits to finish, you would probably want to develop each feature (each tipic) in a separate branch. This way you would be able to switch from working on one feature to working on other feature (on other topic).
This workflow is especially important if you are working with umtiple developers, see below.
One of the most important features of version control system is that it enables collaboration between different developers, allowing multiple people to work on the same project without stomping on each others changes. This feature is well described in other responses, so I won't elaborate on it.
See also "Understanding Version Control", work in progress by Eric S. Raymond (author of, among others, "The Catedral and the Bazaar" and "The Art of Unix Programming") for description of various methods that version control system use to allow collaboration.
Upvotes: 7
Reputation: 713
Git and Subversion (also known as svn) are both source control or version control or revision control systems. They help you manage source code and track a history of the changes to each file managed by the system. The wikipedia article metismo links might be helpful.
github is a service to host and manage git repositories. It basically puts the repository online to make it easy for multiple people to interact with the repository.
The commit command generally stores a set of changes into the source control repository. This creates a new revision in the repository.
The push command only applies to distributed version control systems like git or mercurial (also known as hg). Push allows changes to be moved from one repository to another. The notion of distributed version control systems is that each user has their own repository. As a user completes changes, the user pushes them to other repositories (perhaps a central project repository, or as a patch for another user's repository).
The point of these systems is to
Upvotes: 4
Reputation: 13299
Version control (a.k.a. revision control).
Consider the following problem. You're working on a project with someone else and you're sharing files. You both need to work on, say, "WhateverController.java". It's a huge file and you both need to edit it.
The most primitive way to deal with this, is to not edit the file at the same time, but then both of you have to be on the same page. When you've got a team, especially if the team has members of dozens or hundreds or thousands (typical for open-source projects), this becomes completely impossible.
An old, primitive "solution" to this problem was to have a checkout/checkin mechanism. When you need to edit a file, you "check it out", and the file is locked so no one else can edit it until you unlock it by "checking it in". This is done through the appropriate software, for example Microsoft's breathtakingly stupid piece of crap SourceSafe. But when people forget to "check the file in", then no one else can edit that file while it's in use. Then someone goes on vacation or leaves the project for some other reason and the result is unending chaos, confusion and usually quite a bit of lost code. This adds tremendous management work.
Then came CVS, and subsequently Subversion, which the authors call "CVS done right", so CVS and Subversion are essentially the same idea. With those, there is no actual check out. You just edit the files you need and check them in. Note that the actual files are stored on a central server, and each user runs the software on their own workstations as well. This location on the server is called a repository.
Now, what happens if two people are working on the same file in CVS/Subversion? They are merged, typically using GNU diff and patch. 'diff' is a utility that extracts the difference between two files. 'patch' uses such 'diff' files to patch other files.
So if you're working on WhateverController.java in one function, and I'm working on the same file in a different function, then when you're done with your stuff, you simply check it in, and the changes are applied to the file on the server. Meanwhile, my local copy has no idea of your changes so your changes do not affect my code at all. When I'm done with my changes, I check the file in as well. But now we have this seemingly complicated scenario.
Let's call the original WhateverController.java, file A. You edit the file, and the result is file B. I edit the same file at a different location, without your changes, and this file is file C.
Now we seemingly have a problem. The changes of file B and C are both changes to file A. So in a ridiculously backwards junk like SourceSafe or Dreamweaver will usually end up overriding the change of file B (because it got checked in first).
CVS/Subversion and presumably Git (which I know almost nothing about) create patches instead of just overriding files.
The difference between file A and C is produced and becomes patch X. The difference between A and B is produced and becomes patch Y.
Then patches X and Y are both applied to file A, so the end result is file A + the changes made to B and C on our respective workstations.
Usually this works flawlessly. Sometimes we might be working on the same function in the same code, in which case CVS/Subversion will notify the programmer of a problem, and present the problem within the file itself. Those problems are usually easily fixed, at least I've never had any problem solving them. Graphical utilities such as Visual Studio, Project Builder (Mac OS X) and the such usually show you both files and the conflicts, so you can choose which lines you want to keep and which to throw away... and then you can also edit the file manually if you want to merge the conflict manually.
So in essence, source control is a solution to the problem of multiple people working on the same files. That's basically it.
I hope this explains.
EDIT: There are many other benefits with decent source control systems like Subversion and presumably Git. If there's a problem, you can go back to other versions so you don't have to keep manual backups of everything. In fact, at least with Subversion, if I mess something up or want to take a look at an old version of the code, I can do so without interfering with anyone else's work.
Upvotes: 23
Reputation: 70238
GIT, Subversion and the like are all about version control. If you use such technologies for a project, all your source files are stored in a so-called repository (a.k.a. "repo") - except for files that don't need versioning (big files, user-specific files, ...).
Some advantages of version control are:
Hope that explained the terms you mentioned. I think a good start to get going with version control is Subversion, using TortoiseSVN for Windows if possible. There's even a free book about it - Version Control with Subversion.
Upvotes: 9
Reputation: 96606
Have a look at chapter one of the (free, online) subversion book. It describes what version control systems (such as subversion) are about.
Upvotes: 2
Reputation: 67346
Source code repositories.
Basically a way to share code, between a team, with the ability to see who "committed" (added) what code at what time, and who changed what at what time, etc.
Upvotes: 2
Reputation: 457
They are all different versions of source control:
http://en.wikipedia.org/wiki/Revision_control
Upvotes: 8