regex for python (version number - date format)

Question

I have a file like this. The format of the version is space(s) dash space(s) date. I want to create a dictionary with 4.11.1 - 2020-02-25 as key and everything after that before 3.25.0 - 2019-01-01 as value and so on till the end of the file.

##################
Some texts

4.11.1 - 2020-02-25
-------------------

*some text

** Some more text

3.25.0 - 2019-01-01
-------------------

*some text

** Some more text

This is what I tried:

result ={}
matches = re.findall(r'([\d.]+[^
]+)\s*(.*?)(?=\s*[\d.]+[^
]+|$)', Text, re.S)
for match in matches:
    result[match[0]] = match[1]
 print(result)

It works for most of the cases. But it also prints these as keys :

.com/sth/sth/sth/6)

1.8.2 (https://github.com/sth/sth/sth/5)

1.8.1.

20160918 (see commands under 'some text')

. text text tex

The fourth bird · Accepted Answer

You could use 2 capturing groups, and instead of using re.S use re.M

The pattern will capture in group 1 a version and space(s) dash space(s) using \d+(?:.\d+)+ +- + followed by a date like pattern \d{4}-\d{2}-\d{2}

Note that is does not validate a date itself. This page shows how you can make that date pattern more specific.

The capture group 2 matches all lines that do not start with 1+ digits, a dot and a digit. You can make that part more specific if you want.

^(\d+(?:\.\d+)+ +- +\d{4}-\d{2}-\d{2})
?
((?:(?!\d+\.\d).*(?:
?
|$))*)

Regex demo

import re

result ={}
Text = ("##################
"
    "Some texts

"
    "4.11.1 - 2020-02-25
"
    "-------------------

"
    "*some text

"
    "** Some more text

"
    "3.25.0 - 2019-01-01
"
    "-------------------

"
    "*some text

"
    "** Some more text")
matches = re.findall(r'^(\d+(?:\.\d+)+ +- +\d{4}-\d{2}-\d{2})
?
((?:(?!\d+\.\d).*(?:
?
|$))*)', Text, re.M)

for match in matches:
    result[match[0]] = match[1]
print(result)

Output

{'4.11.1 - 2020-02-25': '-------------------

*some text

** Some more text

', '3.25.0 - 2019-01-01': '-------------------

*some text

** Some more text'}

regex for python (version number - date format)

Answers (1)

Related Questions