Reputation: 155
I am trying to split the data
rest = [" hgod eruehf 10 SECTION 1. DATA: find my book 2.11.111 COLUMN: get me tea","111.2 CONTAIN i am good"]
match = re.compile(r'(((\d[.])(\d[.]))+\s(\w[A-Z]+:|\w+))')
out = match.search(rest)
print(out.group(0))
I found the pattern as "multiple decimal digit(eg:1. / 1.1. / 1.21.1 etc.,) followed by character till another multiple decimal digit(eg:1. / 1.1. / 1.21.1 etc.,) "
I want to split the data as
2.11.111 COLUMN: get me tea
111.2 CONTAIN i am good
Is there any way to split the text data based on the pattern.
Upvotes: 1
Views: 74
Reputation: 626738
You may get the expected matches using
import re
rest = [" hgod eruehf 10 SECTION 1. DATA: find my book 2.11.111 COLUMN: get me tea","111.2 CONTAIN i am good"]
res = []
for s in rest:
res.extend(re.findall(r'\d+(?=\.)(?:\.\d+)*.*?(?=\s*\d+(?=\.)(?:\.\d+)*|\Z)', s))
print(res)
# => ['1. DATA: find my book', '2.11.111 COLUMN: get me tea', '111.2 CONTAIN i am good']
See the Python demo
The regex is applied to each item in the rest
list and all matches are saved into res
list.
Pattern details
\d+
- 1+ digits(?=\.)
- there must be a .
immediately to the right of the current position(?:\.\d+)*
- 0 or more repetitions of a .
and then 1+ digits.*?
- 0+ chars other than newline, as few as possible(?=\s*\d+(?=\.)(?:\.\d+)*|\Z)
- up to the 0+ whitespaces, 1+ digits with a .
immediately to the right of the current position, 0 or more repetitions of a .
and then 1+ digits, or end of stringUpvotes: 2