Cybertron
Cybertron

Reputation: 155

Regex pattern matching in python

I am trying to split the data

rest = [" hgod eruehf 10 SECTION 1. DATA: find my book 2.11.111 COLUMN: get me tea","111.2 CONTAIN  i am good"]

match = re.compile(r'(((\d[.])(\d[.]))+\s(\w[A-Z]+:|\w+))')
out = match.search(rest)
print(out.group(0))

I found the pattern as "multiple decimal digit(eg:1. / 1.1. / 1.21.1 etc.,) followed by character till another multiple decimal digit(eg:1. / 1.1. / 1.21.1 etc.,) "

I want to split the data as

  1. DATA: find my book

2.11.111 COLUMN: get me tea

111.2 CONTAIN i am good

Is there any way to split the text data based on the pattern.

Upvotes: 1

Views: 74

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

You may get the expected matches using

import re
rest = [" hgod eruehf 10 SECTION 1. DATA: find my book 2.11.111 COLUMN: get me tea","111.2 CONTAIN  i am good"]
res = []
for s in rest:
    res.extend(re.findall(r'\d+(?=\.)(?:\.\d+)*.*?(?=\s*\d+(?=\.)(?:\.\d+)*|\Z)', s))

print(res)
# => ['1. DATA: find my book', '2.11.111 COLUMN: get me tea', '111.2 CONTAIN  i am good']

See the Python demo

The regex is applied to each item in the rest list and all matches are saved into res list.

Pattern details

  • \d+ - 1+ digits
  • (?=\.) - there must be a . immediately to the right of the current position
  • (?:\.\d+)* - 0 or more repetitions of a . and then 1+ digits
  • .*? - 0+ chars other than newline, as few as possible
  • (?=\s*\d+(?=\.)(?:\.\d+)*|\Z) - up to the 0+ whitespaces, 1+ digits with a . immediately to the right of the current position, 0 or more repetitions of a . and then 1+ digits, or end of string

Upvotes: 2

Related Questions