airfoyle
airfoyle

Reputation: 453

Mercurial cat command using --include

Using version 4.1.1 of Mercurial, I would like to provide a file specifying a bunch of files as args to an hg cat command, so that each file is output to a different file. I thought the following would work:

hg cat -o 'catOut-%s' --include listfile:files.lst 

where files.lst looks like this

foo01.txt
foo02.txt

But it yields an error message saying "invalid arguments" plus a usage message.

Here is an MWE that sets up a code repository with the required structure and then tries running the cat command shown above.

hg init mwe
cd mwe
echo abc > foo01.txt
echo def > foo02.txt
echo PQR > baz.txt
echo files.lst > .hgignore
hg add .hgignore
hg add foo*.txt
hg add baz.txt
echo foo01.txt >> files.lst
echo foo02.txt >> files.lst
hg ci -m "Adding all files"
hg cat -o 'catOut-%s' baz.txt
cat catOut-baz.txt
rm catOut*
hg cat -o 'catOut-%s' --include listfile:files.lst baz.txt
cat catOut-baz.txt
hg cat -o 'catOut-%s' --include listfile:files.lst 

Here is a trace of these commands and their results as typed to a shell:

~/tmp $ hg init mwe
~/tmp $ cd mwe
~/tmp/mwe $ echo abc > foo01.txt
~/tmp/mwe $ echo def > foo02.txt
~/tmp/mwe $ echo PQR > baz.txt
~/tmp/mwe $ echo files.lst > .hgignore
~/tmp/mwe $ hg add .hgignore
~/tmp/mwe $ hg add foo*.txt
~/tmp/mwe $ hg add baz.txt
~/tmp/mwe $ echo foo01.txt >> files.lst
~/tmp/mwe $ echo foo02.txt >> files.lst
~/tmp/mwe $ hg ci -m "Adding all files"
~/tmp/mwe $ hg cat -o 'catOut-%s' baz.txt
~/tmp/mwe $ cat catOut-baz.txt
cat catOut-baz.txt
PQR
~/tmp/mwe $ rm catOut*
rm catOut*
~/tmp/mwe $ hg cat -o 'catOut-%s' --include listfile:files.lst baz.txt
~/tmp/mwe $ cat catOut-baz.txt
cat: catOut-baz.txt: No such file or directory
~/tmp/mwe $ hg cat -o 'catOut-%s' --include listfile:files.lst 
hg cat -o 'catOut-%s' --include listfile:files.lst 
hg cat: invalid arguments
hg cat [OPTION]... FILE...

output the current or given revision of files

options ([+] can be repeated):

 -o --output FORMAT       print output to file with formatted name
 -r --rev REV             print the given revision
--decode              apply any matching decode filter
 -I --include PATTERN [+] include names matching the given patterns
 -X --exclude PATTERN [+] exclude names matching the given patterns

(use 'hg cat -h' to show more help)
~/tmp/mwe $ 

You have to supply a file argument to avoid the error message. But that argument is ignored if an --include and an -o are supplied.

I suspect no one has ever used the --include argument to cat before, because there is a dearth of explanation out there about how --include arguments are handled. Either that or I'm overlooking something obvious.

Upvotes: 1

Views: 288

Answers (1)

torek
torek

Reputation: 488453

You have to supply a file argument to avoid the error message. But that argument is ignored if an --include and an -o are supplied.

It is not literally ignored. The problem is that --include means something odd.

... because there is a dearth of explanation out there about how --include arguments are handled.

That does seem to be the case! There is a description in hg help patterns but it is rather inadequate (in my opinion at least). What --include means is that only files matching the patterns in the file are used. Think of this as "include only", rather than "also include".

Thus, if your listfile has those two file names in it, you may run, e.g.:

hg cat -o 'catOut-%s' --include listfile:files.lst baz.txt foo01.txt

and Mercurial will extract foo01.txt since it's in the list.

You might think you could use:

hg cat -o 'catOut-%s' --include listfile:files.lst '*'

but you can't (well, you can on Windows, as hg does glob style matching there, but that's the wrong approach). The right trick is to direct hg cat to read a directory, namely the top level directory of the repository:

hg cat .

(though there are similar methods, such as using set:*; see hg help filesets). Then the filtering produced by --include strips you down to just the files you want included.

More "color", as they say in some circles - no need to read this!

(This is just side stuff I found while researching this answer a bit. I wondered how one made hg cat scan every file in a revision, so I plunged into the source.)

For reference, here is the snippet of Python code that implements hg cat:

@command('cat',
    [('o', 'output', '',
     _('print output to file with formatted name'), _('FORMAT')),
    ('r', 'rev', '', _('print the given revision'), _('REV')),
    ('', 'decode', None, _('apply any matching decode filter')),
    ] + walkopts,
    _('[OPTION]... FILE...'),
    inferrepo=True)
def cat(ui, repo, file1, *pats, **opts):
    """output the current or given revision of files

    Print the specified files as they were at the given revision. If
    no revision is given, the parent of the working directory is used.

    Output may be to a file, in which case the name of the file is
    given using a format string. The formatting rules as follows:

    :``%%``: literal "%" character
    :``%s``: basename of file being printed
    :``%d``: dirname of file being printed, or '.' if in repository root
    :``%p``: root-relative path name of file being printed
    :``%H``: changeset hash (40 hexadecimal digits)
    :``%R``: changeset revision number
    :``%h``: short-form changeset hash (12 hexadecimal digits)
    :``%r``: zero-padded changeset revision number
    :``%b``: basename of the exporting repository

    Returns 0 on success.
    """
    ctx = scmutil.revsingle(repo, opts.get('rev'))
    m = scmutil.match(ctx, (file1,) + pats, opts)

    ui.pager('cat')
    return cmdutil.cat(ui, repo, ctx, m, '', **opts)

The most critical line is:

def cat(ui, repo, file1, *pats, **opts):

This means that non-option FILE... arguments (as in the description just before the def) are bound with the first one going to file1 and the rest going to *pats (as a Python tuple). This forces you to pass one or more file-name or file-set arguments.

Those file name arguments (baz.txt or whatever) are passed in to scmutil.match, which is what is going to find the files in the manifest for the specified revision—the one now in ctx, obtained by the previous line calling scmutil.revsingle, which gets the last revision in the --rev option, defaulting to the current revision (the first parent of the working directory).

It's scmutil.match that handles the --include option. Unfortunately this code is rather impenetrable:

m = ctx.match(pats, opts.get('include'), opts.get('exclude'),
              default, listsubrepos=opts.get('subrepos'), badfn=badfn)

(with pats being the non-empty file names passed in as command line arguments), which invokes this code in context.py:

def match(self, pats=None, include=None, exclude=None, default='glob',
          listsubrepos=False, badfn=None):
    if pats is None:
        pats = []
    r = self._repo
    return matchmod.match(r.root, r.getcwd(), pats,
                          include, exclude, default,
                          auditor=r.nofsauditor, ctx=self,
                          listsubrepos=listsubrepos, badfn=badfn)

which gets us into match.py's class match object, which is what implements the listfile: part. Here's a bit from that:

    matchfns = []
    if include:
        kindpats = self._normalize(include, 'glob', root, cwd, auditor)
        self.includepat, im = _buildmatch(ctx, kindpats, '(?:/|$)',
                                          listsubrepos, root)
        roots, dirs = _rootsanddirs(kindpats)
        self._includeroots.update(roots)
        self._includedirs.update(dirs)
        matchfns.append(im)

and self._normalize winds up reading the file given as the listfile argument, so that's what is in kindpats. (The string literal passed to _buildmatch is a regular expression glob suffix pattern, i.e., file names from the include file are followed by an implied trailing slash or end-of-string.)

Upvotes: 1

Related Questions