Reputation: 198
I have a file like this. The format of the version is <version> space(s) dash space(s) date
. I want to create a dictionary with 4.11.1 - 2020-02-25
as key and everything after that before 3.25.0 - 2019-01-01
as value and so on till the end of the file.
##################
Some texts
4.11.1 - 2020-02-25
-------------------
*some text
** Some more text
3.25.0 - 2019-01-01
-------------------
*some text
** Some more text
This is what I tried:
result ={}
matches = re.findall(r'([\d.]+[^\n]+)\s*(.*?)(?=\s*[\d.]+[^\n]+|$)', Text, re.S)
for match in matches:
result[match[0]] = match[1]
print(result)
It works for most of the cases. But it also prints these as keys :
.com/sth/sth/sth/6)
1.8.2 (https://github.com/sth/sth/sth/5)
1.8.1.
20160918 (see commands under 'some text')
. text text tex
Upvotes: 0
Views: 105
Reputation: 163277
You could use 2 capturing groups, and instead of using re.S
use re.M
The pattern will capture in group 1 a version and space(s) dash space(s) using \d+(?:.\d+)+ +- +
followed by a date like pattern \d{4}-\d{2}-\d{2}
Note that is does not validate a date itself. This page shows how you can make that date pattern more specific.
The capture group 2 matches all lines that do not start with 1+ digits, a dot and a digit. You can make that part more specific if you want.
^(\d+(?:\.\d+)+ +- +\d{4}-\d{2}-\d{2})\r?\n((?:(?!\d+\.\d).*(?:\r?\n|$))*)
import re
result ={}
Text = ("##################\n"
"Some texts\n\n"
"4.11.1 - 2020-02-25\n"
"-------------------\n\n"
"*some text\n\n"
"** Some more text\n\n"
"3.25.0 - 2019-01-01\n"
"-------------------\n\n"
"*some text\n\n"
"** Some more text")
matches = re.findall(r'^(\d+(?:\.\d+)+ +- +\d{4}-\d{2}-\d{2})\r?\n((?:(?!\d+\.\d).*(?:\r?\n|$))*)', Text, re.M)
for match in matches:
result[match[0]] = match[1]
print(result)
Output
{'4.11.1 - 2020-02-25': '-------------------\n\n*some text\n\n** Some more text\n\n', '3.25.0 - 2019-01-01': '-------------------\n\n*some text\n\n** Some more text'}
Upvotes: 3