notNullGothik
notNullGothik

Reputation: 432

Perforce: How does files get stored with branching?

A very basic question about branching and duplicating resources, I have had discussion like this due to the size of our main branch, but put aside it is great to know how this really works.

Consider the problem of branching dozens of Gb. What happens when you create a branch of this massive amount of information?

Am reading the official doc here and here, but am still confused on how the files are stored for each branch on the server.

Say a file A.txt exists in main branch. When creating the branch (Xbranch) and considering A.txt won't have changes, will the perforce server duplicate the A.txt (one keeping the main changes and another for the Xbranch)?

For a massive amount of data, it becomes a matter because it will mean duplicate the dozens of Gb. So how does this really work?

Upvotes: 1

Views: 2007

Answers (2)

Robert Cowham
Robert Cowham

Reputation: 422

Some notes in addition to Bryan Pendleton's answer (and the questions from it)

To really check your understanding of what is going on, it is good to try with a test repository with a small number of files and to create checkpoints after each major action and then compare the checkpoints to see what actual database rows were written (as well as having a look at the archive files that the server maintains). This is very quick and easy to setup. You will notice that every branched file generates records in db.integed, db.rev, db.revcx and db.revhx - let alone any in db.have.

You also need to be aware of which server version you are using as the behavior has been enhanced over time. Check the output of "p4 help obliterate":

Obliterate is aware of lazy copies made when 'p4 integrate' creates a branch, and does not remove copies that are still in use. Because of this, obliterating files does not guarantee that the corresponding files in the archive will be removed.

Some other points:

  • The default flags for "p4 integrate" to create branches copied the files down to the client workspace and then copied them back to the server with the submit. This took time depending on how many and how big the files were. It has long been possible to avoid this using the -v (virtual) flag, which just creates the appropriate rows on the server and avoids updating the client workspace - usually hugely faster. The possible slight downside is you have to sync the files afterwards to work on them.
  • Newer releases of Perforce have the "p4 populate" command which does the same as an "integrate -v" but also does not actually require the target files to be mapped into the current client workspace - this avoids the dreaded "no target file(s) in client view" error which many beginners have struggled with! [In P4V this is the "Branch files..." command on right click menu, rather than "Merge/Integrate..."]
  • Streams has made branching a lot slicker and easier in many ways - well worth reading up on and playing with (the only potential fly in the ointment is a flat 2 level naming hierarchy, and also potential challenges in migrating existing branches with existing relationships into streams)
  • Task streams are pretty nifty and save lots of space on the server
  • Obliterate has had an interesting flag -b for a few releases which is like being able to quickly and easily remove unchanged branch files - so like retro-creating a task stream. Can potentially save millions of database rows in larger installations with lots of branching

Upvotes: 4

Bryan Pendleton
Bryan Pendleton

Reputation: 16339

In general, branching a file does not create a copy of the file's contents; instead, the Perforce server just writes an additional database record describing the new revision, but shares the single copy of the file's contents.

Perforce refers to these as "lazy copies"; you can learn more about them here: http://answers.perforce.com/articles/KB_Article/How-to-Identify-a-Lazy-Copy-of-a-File

One exception is if you use the "+S" filetype modifier, as in this case each branch will have its own copy of the content, so that the +S semantics can be performed properly on each branch independently.

Upvotes: 3

Related Questions