F0UF
F0UF

Reputation: 83

Use regex to get comment block from a git log

I'm stuck on something that might be very simple, but I can't find a solution. I'm using Python since few days and I need to use regex to get part of a file.

I put the result of a git log -p into a file, and now I want to extract some informations. The only thing I can't extract is the comment block.

This block is between : a date line AND (a diff line OR the end of the list).

...
Date:   Wed Jul 3 22:32:36 2013 +0200

    Here is the comment
    of a commit

    and I have to
    extract it

diff --git a/dir.c b/dir.c
...

...
Date:   Wed Jul 3 22:32:36 2013 +0200

    Here is the comment
    of a commit

    and I have to
    extract it

So I tried to do this :

commentBlock = re.compile("(?<=Date:.{32}\n\n).+(?=|\n\ndiff)", re.M|re.DOTALL)
findCommentBlock = re.findall(commentBlock,commitBlock[i]) # I've splited my git log everytime I find a "commit" line.

Problems are :

P.S. I'm working on Python 3.x and I almost finished my script so I don't really wanna use a specific tool like GitPython (that only works on 2.x)

Upvotes: 3

Views: 310

Answers (3)

woemler
woemler

Reputation: 7169

Give this a try:

re.findall('Date:.+?\n\s*(.+?)\s*(?:diff|$)', text, re.S)

This should return a list of comment entries, assuming that all of the log entries follow the same pattern you have laid out here.

Upvotes: 0

FMc
FMc

Reputation: 42421

Here's one way to do it:

rgx = re.compile(r'^Date: .+?\n+(.+?)(?:^diff |\Z)', re.MULTILINE | re.DOTALL)
comments = rgx.findall(txt)

A few notes:

  • I don't think you need to worry about the length of the Date line.
  • Capture the part you care about. This has two implications. (1) To ignore the Date line, just consume (non-greedily) everything through the first newlines. (2) You don't need a lookahead assertion; a non-capturing group (?:...) will work fine.
  • It's probably a good idea to make the captured wildcard non-greedy as well: .+?.
  • You can indicate the end of a string in a regex with \Z. Thus, the non-capturing group means: (a) a line beginning with "diff " or (b) end of string.
  • More details on regex features can be found in the excellent Python docs.

Upvotes: 1

Assaf Lavie
Assaf Lavie

Reputation: 75983

Though the date may change in length, it is definitely terminated by a new-line, so why limit the number of characters at all?

Anyway, you should be able to do something like {32,33} to capture the range.

Upvotes: 0

Related Questions