Reputation: 453
Using version 4.1.1 of Mercurial, I would like to provide a file specifying a bunch of files as args to an hg cat
command, so that each file is output to a different file. I thought the following would work:
hg cat -o 'catOut-%s' --include listfile:files.lst
where files.lst looks like this
foo01.txt
foo02.txt
But it yields an error message saying "invalid arguments" plus a usage message.
Here is an MWE that sets up a code repository with the required structure and then tries running the cat
command shown above.
hg init mwe
cd mwe
echo abc > foo01.txt
echo def > foo02.txt
echo PQR > baz.txt
echo files.lst > .hgignore
hg add .hgignore
hg add foo*.txt
hg add baz.txt
echo foo01.txt >> files.lst
echo foo02.txt >> files.lst
hg ci -m "Adding all files"
hg cat -o 'catOut-%s' baz.txt
cat catOut-baz.txt
rm catOut*
hg cat -o 'catOut-%s' --include listfile:files.lst baz.txt
cat catOut-baz.txt
hg cat -o 'catOut-%s' --include listfile:files.lst
Here is a trace of these commands and their results as typed to a shell:
~/tmp $ hg init mwe
~/tmp $ cd mwe
~/tmp/mwe $ echo abc > foo01.txt
~/tmp/mwe $ echo def > foo02.txt
~/tmp/mwe $ echo PQR > baz.txt
~/tmp/mwe $ echo files.lst > .hgignore
~/tmp/mwe $ hg add .hgignore
~/tmp/mwe $ hg add foo*.txt
~/tmp/mwe $ hg add baz.txt
~/tmp/mwe $ echo foo01.txt >> files.lst
~/tmp/mwe $ echo foo02.txt >> files.lst
~/tmp/mwe $ hg ci -m "Adding all files"
~/tmp/mwe $ hg cat -o 'catOut-%s' baz.txt
~/tmp/mwe $ cat catOut-baz.txt
cat catOut-baz.txt
PQR
~/tmp/mwe $ rm catOut*
rm catOut*
~/tmp/mwe $ hg cat -o 'catOut-%s' --include listfile:files.lst baz.txt
~/tmp/mwe $ cat catOut-baz.txt
cat: catOut-baz.txt: No such file or directory
~/tmp/mwe $ hg cat -o 'catOut-%s' --include listfile:files.lst
hg cat -o 'catOut-%s' --include listfile:files.lst
hg cat: invalid arguments
hg cat [OPTION]... FILE...
output the current or given revision of files
options ([+] can be repeated):
-o --output FORMAT print output to file with formatted name
-r --rev REV print the given revision
--decode apply any matching decode filter
-I --include PATTERN [+] include names matching the given patterns
-X --exclude PATTERN [+] exclude names matching the given patterns
(use 'hg cat -h' to show more help)
~/tmp/mwe $
You have to supply a file argument to avoid the error message. But that argument is ignored if an --include and an -o are supplied.
I suspect no one has ever used the --include
argument to cat
before, because there is a dearth of explanation out there about how --include
arguments are handled. Either that or I'm overlooking something obvious.
Upvotes: 1
Views: 288
Reputation: 488453
You have to supply a file argument to avoid the error message. But that argument is ignored if an
--include
and an-o
are supplied.
It is not literally ignored. The problem is that --include
means something odd.
... because there is a dearth of explanation out there about how
--include
arguments are handled.
That does seem to be the case! There is a description in hg help patterns
but it is rather inadequate (in my opinion at least). What --include
means is that only files matching the patterns in the file are used. Think of this as "include only", rather than "also include".
Thus, if your listfile has those two file names in it, you may run, e.g.:
hg cat -o 'catOut-%s' --include listfile:files.lst baz.txt foo01.txt
and Mercurial will extract foo01.txt
since it's in the list.
You might think you could use:
hg cat -o 'catOut-%s' --include listfile:files.lst '*'
but you can't (well, you can on Windows, as hg does glob style matching there, but that's the wrong approach). The right trick is to direct hg cat
to read a directory, namely the top level directory of the repository:
hg cat .
(though there are similar methods, such as using set:*
; see hg help filesets
). Then the filtering produced by --include
strips you down to just the files you want included.
(This is just side stuff I found while researching this answer a bit. I wondered how one made hg cat
scan every file in a revision, so I plunged into the source.)
For reference, here is the snippet of Python code that implements hg cat
:
@command('cat',
[('o', 'output', '',
_('print output to file with formatted name'), _('FORMAT')),
('r', 'rev', '', _('print the given revision'), _('REV')),
('', 'decode', None, _('apply any matching decode filter')),
] + walkopts,
_('[OPTION]... FILE...'),
inferrepo=True)
def cat(ui, repo, file1, *pats, **opts):
"""output the current or given revision of files
Print the specified files as they were at the given revision. If
no revision is given, the parent of the working directory is used.
Output may be to a file, in which case the name of the file is
given using a format string. The formatting rules as follows:
:``%%``: literal "%" character
:``%s``: basename of file being printed
:``%d``: dirname of file being printed, or '.' if in repository root
:``%p``: root-relative path name of file being printed
:``%H``: changeset hash (40 hexadecimal digits)
:``%R``: changeset revision number
:``%h``: short-form changeset hash (12 hexadecimal digits)
:``%r``: zero-padded changeset revision number
:``%b``: basename of the exporting repository
Returns 0 on success.
"""
ctx = scmutil.revsingle(repo, opts.get('rev'))
m = scmutil.match(ctx, (file1,) + pats, opts)
ui.pager('cat')
return cmdutil.cat(ui, repo, ctx, m, '', **opts)
The most critical line is:
def cat(ui, repo, file1, *pats, **opts):
This means that non-option FILE...
arguments (as in the description just before the def
) are bound with the first one going to file1
and the rest going to *pats
(as a Python tuple). This forces you to pass one or more file-name or file-set arguments.
Those file name arguments (baz.txt
or whatever) are passed in to scmutil.match
, which is what is going to find the files in the manifest for the specified revision—the one now in ctx
, obtained by the previous line calling scmutil.revsingle
, which gets the last revision in the --rev
option, defaulting to the current revision (the first parent of the working directory).
It's scmutil.match
that handles the --include
option. Unfortunately this code is rather impenetrable:
m = ctx.match(pats, opts.get('include'), opts.get('exclude'),
default, listsubrepos=opts.get('subrepos'), badfn=badfn)
(with pats
being the non-empty file names passed in as command line arguments), which invokes this code in context.py
:
def match(self, pats=None, include=None, exclude=None, default='glob',
listsubrepos=False, badfn=None):
if pats is None:
pats = []
r = self._repo
return matchmod.match(r.root, r.getcwd(), pats,
include, exclude, default,
auditor=r.nofsauditor, ctx=self,
listsubrepos=listsubrepos, badfn=badfn)
which gets us into match.py
's class match
object, which is what implements the listfile:
part. Here's a bit from that:
matchfns = []
if include:
kindpats = self._normalize(include, 'glob', root, cwd, auditor)
self.includepat, im = _buildmatch(ctx, kindpats, '(?:/|$)',
listsubrepos, root)
roots, dirs = _rootsanddirs(kindpats)
self._includeroots.update(roots)
self._includedirs.update(dirs)
matchfns.append(im)
and self._normalize
winds up reading the file given as the listfile
argument, so that's what is in kindpats
. (The string literal passed to _buildmatch
is a regular expression glob suffix pattern, i.e., file names from the include file are followed by an implied trailing slash or end-of-string.)
Upvotes: 1