Reputation: 35
I am writing a script to remove git commit tags (eg Signed-off-by:, Reviewed-by:
) from each git commit message. Currently the script is in python. Right now I have a very simple re.match("Signed-off-by:", line)
check. But I think there should be more elegant solution using regular expression.
I am assuming that a footer will begins with [more words separating by -]: For example
Bug:, Issue:, Reviewed-by:, Tested-by:, Ack-by:, Suggested-by:, Signed-off-by:
The pattern should ignore case. I need help coming up with a solution using regular expression for this. I also want to learn more about RE, what is a good starting point?
The actual python script is here https://gerrit-review.googlesource.com/#/c/33213/2/tools/gitlog2asciidoc.py
You could also comment on the script if you sign up for an account.
Thanks
Upvotes: 1
Views: 951
Reputation: 3801
I’m assuming that the original question was about removing all trailers.
I am assuming that a footer will begins with [more words separating by -]:
From man git-intepret-trailers
:
Existing trailers are extracted from the input message by looking for a group of one or more lines that (i) is all trailers, or (ii) contains at least one Git-generated or user-configured trailer and consists of at least 25% trailers. The group must be preceded by one or more empty (or whitespace-only) lines. The group must either be at the end of the message or be the last non-whitespace lines before a line that starts with --- (followed by a space or the end of the line). Such three minus signs start the patch part of the message. See also --no-divider below.
(git version 2.39.2
)
(I will ignore the “divider line” part since that is irrelevant for commit messages.)
That sounds too involved for a regex.
git interpret-trailers
can already parse trailers for you. Done. Right? Not quite.
git intepret-trailers
has the --only-trailers
option, but not the dual --whole-message-except-trailers
(or something). So it looks like we have to do some work.[1]
Get the whole commit message of a SHA1:
git log -1 --format='%s%n%n%b' 688ce90c53d7565f6f8e1d5e438b960620630448
In this example that would be:
Bug: bad documentation of commit conventions
It has come to my attention that some of our committers don’t know how
Signed-off-by: trailers are supposed to be used. Unacceptable! Let me
elucidate this in our docs.
Make haste!
Keywords: nitpicking
Cautioned-against-by: Victor Version Control <[email protected]>
Reviewed-by: Sophia Change My Mind <[email protected]>
Nacked-by: Hector Relaxed <[email protected]>
Yawned-at-by: Yellow Baggers <[email protected]>
I want to remove the five last lines.
We can use grep --invert-match --fixed-strings
. The problem though is that we want to negatively match on multiple lines: keep only lines that don’t match this-or-that. We can do that with:
grep --invert-match --fixed-strings --regex='Keywords: nitpicking' […]
And we can build up that command using (sigh)… Bash.
#!/usr/bin/env bash
grep_command="grep --invert-match --fixed-strings "
trailers=$(git log -1 --format='%s%n%n%b' 688ce90c53d7565f6f8e1d5e438b960620630448 \
| git interpret-trailers --only-trailers)
while IFS= read -r trailer; do
# `--regex=<trailer>` to `grep` with single quote delimiters
grep_command+=--regex=\'"$trailer"\'" "
done <<< "$trailers"
# `git log` reprise
git log -1 --format='%s%n%n%b' 688ce90c53d7565f6f8e1d5e438b960620630448 \
| eval "$grep_command"
Output:
Bug: bad documentation of commit conventions
It has come to my attention that some of our committers don’t know how
Signed-off-by: trailers are supposed to be used. Unacceptable! Let me
elucidate this in our docs.
Make haste!
It seems this outputs one or two newlines extra at the end. I guess that can be postprocessed away.
git log
supports formats like %(trailer)
and %b
(body), but seemingly not body-except-trailers.Upvotes: 0
Reputation: 14209
>>> def match_commit(s):
r = re.compile(r'((\w+*)+\w+:)')
return re.match(r, s) is not None
>>> match_commit("Signed-off-by:")
True
>>> match_commit("Signed-off+by:")
False
>>> match_commit("Signed--by:")
False
>>> match_commit("Bug:")
True
>>> match_commit("Bug-:")
False
The 1st group (\w+-)*
captures 0 to any repetitions of patterns "word + '-'", the last one \w+:
looks for the last word + ':'.
Upvotes: 1
Reputation: 123662
This is a nice use case for any
:
for line in logfile:
if any(line.lower().startswith(prefix) for prefix in prefixes):
print line
Upvotes: 0
Reputation: 40394
While the regular expression approach would be nice and with just a flag you can ignore case, I think that in this case you can just use startswith
to achieve the same goal:
prefixes = ['bug:', 'issue:', 'reviewed-by:', 'tested-by:',
'ack-by:', 'suggested-by:', 'signed-off-by:']
...
lower_line = line.lower()
for prefix in prefixes:
if lower_line.startswith(prefix):
print 'prefix matched:', prefix
else:
print 'no match found'
Upvotes: 1