Reputation: 83
I'm stuck on something that might be very simple, but I can't find a solution. I'm using Python since few days and I need to use regex to get part of a file.
I put the result of a git log -p
into a file, and now I want to extract some informations. The only thing I can't extract is the comment block.
This block is between : a date line AND (a diff line OR the end of the list).
...
Date: Wed Jul 3 22:32:36 2013 +0200
Here is the comment
of a commit
and I have to
extract it
diff --git a/dir.c b/dir.c
...
...
Date: Wed Jul 3 22:32:36 2013 +0200
Here is the comment
of a commit
and I have to
extract it
So I tried to do this :
commentBlock = re.compile("(?<=Date:.{32}\n\n).+(?=|\n\ndiff)", re.M|re.DOTALL)
findCommentBlock = re.findall(commentBlock,commitBlock[i]) # I've splited my git log everytime I find a "commit" line.
Problems are :
Date:.{32}
if the date is between the 1st to 9th or Date:.{33}
if the date is 2 numbers long.diff
OR when it's the end of the list (or the file)".P.S. I'm working on Python 3.x and I almost finished my script so I don't really wanna use a specific tool like GitPython
(that only works on 2.x)
Upvotes: 3
Views: 310
Reputation: 7169
Give this a try:
re.findall('Date:.+?\n\s*(.+?)\s*(?:diff|$)', text, re.S)
This should return a list of comment entries, assuming that all of the log entries follow the same pattern you have laid out here.
Upvotes: 0
Reputation: 42421
Here's one way to do it:
rgx = re.compile(r'^Date: .+?\n+(.+?)(?:^diff |\Z)', re.MULTILINE | re.DOTALL)
comments = rgx.findall(txt)
A few notes:
(?:...)
will work fine..+?
.\Z
. Thus, the non-capturing group means: (a) a line beginning with "diff " or (b) end of string.Upvotes: 1
Reputation: 75983
Though the date may change in length, it is definitely terminated by a new-line, so why limit the number of characters at all?
Anyway, you should be able to do something like {32,33}
to capture the range.
Upvotes: 0