Reputation: 2801

Git: Fork vs submodules vs Subtree

I have an ecommerce project that is made up of 3 parts, an admin dashboard, an api(core), and a storefront(frontend), the dashboard and the api will be taken from open source repositories and the storefront will be developed from 0 on its own repository, now my question is, what is the best way to handle this on git? create submodules with the open source repositories and my frontend repository, or fork the open source repositories and work on it and add the frontend.

I want to make some changes in the open source repositories, but also keep it synchronized with the changes that can be made in the original repo.

EDIT 1

I'll use docker-compose to run the services..

EDIT 2

This is the open source repository that I want to use as template to develop my ecommerce, the diference is that I want to develop the storefront with other technology...

Upvotes: 1

Answers (1)

torek

Reputation: 490078

... what is the best way to handle this on git?

Best questions are usually opinion questions, and hence off-topic on Stack Overflow.

What we can say from a technical perspective is this:

A fork is just a clone with some features. The features are determined by the hosting system: GitHub forks, Bitbucket forks, and so on are not exactly 100% identical in terms of added features, because each hosting provider wants to get you to use them and not the others. But they all start by cloning.
A submodule is just a Git repository. What makes it a submodule is that some other Git repository says: clone this Git repository, then checkout out commit _____ (fill in the blank with a hash ID). The other Git repository—the one with the clone-and-checkout instructions in it—we call a superproject, and this Git repository, we call a submodule.

Submodules have a bunch of annoyances that come with them. The biggest one is implied by the blank above. The superproject Git repository is required to list the exact commit hash ID to use in the submodule. Any changes made to the submodule require making a change to the superproject as well: you must make a new commit in the superproject so that you list a new and different submodule hash ID.

You cannot have the superproject refer to the submodule by branch name. You can put a name into the superproject, but the superproject Git mostly doesn't use the name. When it goes to command the submodule to get some commit, it does so by raw hash ID. So you're stuck dealing with raw hash IDs. There is nothing wrong with this, it's just something to be aware of.

Submodules typically have the secondary annoyance that, to build the software, you need the superproject and all of the submodules. Any glitch in getting the submodules and checking out the right commits holds everything up. Such glitches are not common, but they do occur. Since the superproject doesn't contain the submodule, the superproject can be small: in fact, a superproject that has no code of its own—that simply refers to submodules—can be tiny.
A subtree is neither of the above. When using git subtree, you take two plain (not-specially-handled) Git repositories. They each have their own branches and everything as usual. Now and then, you run git subtree with a subcommand: split to split something up, or merge to combine something back in. (The pull and push commands are merge and split in disguise, followed by a separate second Git command.)

Remembering that all Git repositories are primarily collections of commits, the split subcommand lets you take a collection of commits (usually less than every commit, but it could be every commit) in the "bigger" Git repository, and modify and extract some of those commits so they'll fit in the "smaller" Git repository. Meanwhile the merge subcommand takes the commits in the smaller repository, matches them up against an earlier split, figures out what's new since then, and "embiggens" them to fit back into the larger repository and adds them to that larger repository.

What this means is that both the smaller and larger repositories are self-contained, which was also true with the submodule approach, but that the larger repository is so self-contained that it never needs the smaller repository at all. It just has it available, if it has it available, to do more split-and-merge operations. Everything is in the larger repository. You don't have to get or use the smaller repository at all, ever, unless you want to do a new split and/or merge. You can give the smaller repository out to others, while keeping the large one private; changes people make to add to the smaller repository, you can try to take back later with git subtree merge.

Unlike submodules, this means the larger ("superproject"-ish) repository contains everything. So there's never any problem with the submodule not being available but, unlike submodules, the "superproject" is always the biggest thing around. You are not referring to some other Git repository's commits. You contain those commits, as transformed by the merge subcommand to fit into your own "superproject"-ish repository. So your repository can only be tiny if the whole project is tiny (in which case, why did you bother with subtrees?).

Upvotes: 6

Git: Fork vs submodules vs Subtree

EDIT 1

EDIT 2

Answers (1)

Related Questions