Praveen
Praveen

Reputation: 2187

Python regex matching for multiline text

I have a log file with the below content

commit da83ddfdfb36f0c48ab2137efaa8c81a6bb41993
Author: ”abc <[email protected]>
Commit: ”abc <[email protected]>
..
..

I am trying to create regex matching expression as below

TEST_COMMIT = 'commit\ (?P<commit>[a-f0-9]+)\n(?P<author>Author.*)\n'
RE_COMMIT = re.compile(TEST_COMMIT, re.MULTILINE | re.VERBOSE)

This matches fine on regex101 (https://regex101.com/) but does not work in my code.

I want to get the commit ID and the Author info as separate group expressions. So

commit group should be : `da83ddfdfb36f0c48ab2137efaa8c81a6bb41993`
author group should be : `Author: ”abc <[email protected]>

My python version is 2.7.12

Any comments on what I am doing wrong ?

Upvotes: 0

Views: 78

Answers (1)

Praveen
Praveen

Reputation: 2187

Finally, I have been able to resolve this issue.

The problem was that the logfile new line was carriage return + new line. \r\n

Once the Regex is changed to include \r\n its able to get the regex groups correctly. This code is working

TEST_COMMIT = r'''
commit\ (?P<commit>[a-f0-9]+)\r\n
(?P<author>Author.*)\r\n'
(?P<committer>Commit.*)\r\n'
(?<message>.*)\r\n
)
'''
RE_COMMIT = re.compile(TEST_COMMIT, re.MULTILINE | re.VERBOSE)

commits = RE_COMMIT.finditer(data)

Upvotes: 1

Related Questions