Reputation: 449
I have large file contain multiple lines but in some line having unique pattern, I want to split our large file based on this pattern. Below data in text file:
commit e6bcab96ffe1f55e80be8d0c1e5342fb9d69ca30
Date: Sat Jun 9 04:11:37 2018 +0530
configurations
commit 5c8deb3114b4ed17c5d2ea31842869515073670f
Date: Sat Jun 9 02:59:56 2018 +0530
remote
commit 499516b7e4f95daee4f839f34cc46df404b52d7a
Date: Sat Jun 9 02:52:51 2018 +0530
remote fix
This reverts commit 0a2917bd49eec7ca2f380c890300d75b69152353.
commit 349e1b42d3b3d23e95a227a1ab744fc6167e6893
Date: Sat Jun 9 02:52:37 2018 +0530
Revert "Removing the printf added"
This reverts commit da0fac94719176009188ce40864b09cfb84ca590.
commit 8bfd4e7086ff5987491f280b57d10c1b6e6433fe
Date: Sat Jun 9 02:52:18 2018 +0530
Revert Bulk
This reverts commit c2ee318635987d44e579c92d0b86b003e1d2a076.
commit bcb10c54068602a96d367ec09f08530ede8059ef
Date: Fri Jun 8 19:53:03 2018 +0530
fix crash observed
commit a84169f79fbe9b18702f6885b0070bce54d6dd5a
Date: Fri Jun 8 18:14:21 2018 +0530
Interface PBR
commit 254726fe3fe0b9f6b228189e8a6fe7bdf4aa9314
Date: Fri Jun 8 18:12:10 2018 +0530
Crash observed
commit 18e7106d54e19310d32e8b31d584cec214fb2cb7
Date: Fri Jun 8 18:09:13 2018 +0530
Changes to fix crash
Currently my code as below:
import re
readtxtfile = r'C:\gitlog.txt'
with open(readtxtfile) as fp:
txtrawdata = fp.read()
commits = re.split(r'^(commit|)[ a-zA-Z0-9]{40}$',txtrawdata)
print(commits)
Expected Output: I want to split above string based on "commit 18e7106d54e19310d32e8b31d584cec214fb2cb7" and convert them into python list.
Upvotes: 2
Views: 6114
Reputation: 195593
Explanation of this regex in Regex101 here.
groups = re.findall(r'(^\s*commit\s+[a-z0-9]+.*?)(?=^commit|\Z)', data, flags=re.DOTALL|re.MULTILINE)
for g in groups:
print(g)
print('-' * 80)
Prints:
commit e6bcab96ffe1f55e80be8d0c1e5342fb9d69ca30
Date: Sat Jun 9 04:11:37 2018 +0530
configurations
--------------------------------------------------------------------------------
commit 5c8deb3114b4ed17c5d2ea31842869515073670f
Date: Sat Jun 9 02:59:56 2018 +0530
remote
--------------------------------------------------------------------------------
commit 499516b7e4f95daee4f839f34cc46df404b52d7a
Date: Sat Jun 9 02:52:51 2018 +0530
remote fix
This reverts commit 0a2917bd49eec7ca2f380c890300d75b69152353.
--------------------------------------------------------------------------------
...and so on
Upvotes: 1
Reputation: 26
This will extract the commit shas:
commits = list()
readtxtfile = r'C:\gitlog.txt'
with open(readtxtfile) as fp:
for line in fp:
m = re.match('^commit\s+([a-f0-9]{40})$', line)
if m:
commits.append(m.group(0))
commits is now a list of just the strings of the commit. Now if your gitlog output format changes this will change the matching regex. Make sure you're generating it with --no-abbrev-commit
.
Upvotes: 0
Reputation: 107095
import re
text = ''' commit e6bcab96ffe1f55e80be8d0c1e5342fb9d69ca30
Date: Sat Jun 9 04:11:37 2018 +0530
configurations
commit 5c8deb3114b4ed17c5d2ea31842869515073670f
Date: Sat Jun 9 02:59:56 2018 +0530
remote
commit 499516b7e4f95daee4f839f34cc46df404b52d7a
Date: Sat Jun 9 02:52:51 2018 +0530
remote fix
This reverts commit 0a2917bd49eec7ca2f380c890300d75b69152353.'''
print(re.split(r'^\s*commit \S*\s*', text, flags=re.MULTILINE))
This outputs:
['', 'Date: Sat Jun 9 04:11:37 2018 +0530\n\n configurations\n', 'Date: Sat Jun 9 02:59:56 2018 +0530\n\n remote\n', 'Date: Sat Jun 9 02:52:51 2018 +0530\n\n remote fix\n This reverts commit 0a2917bd49eec7ca2f380c890300d75b69152353.']
Upvotes: 1