omer
omer

Reputation: 562

Grouping a repetitive section starting and end on a specific format/word

I am trying to capture a multiline string which start at a specific word Case and end with a date with format dd.mm.yyyy OR dd.m.dddd

Here is sample text:

Case No.X.1 I know I am a lucky boy (1) a sda asddasd (ii) sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818

Case No.X.1 I know I am a lucky boy (1) a sda asddasd (ii) sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818

Case No.X.1 I know I am a lucky boy (1) a sda asddasd (ii) sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818

Case No.X.1 I know I am a lucky boy (1) a sda asddasd (ii) sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818

I am trying this:

Flags: g m i

^case[^]*\d{1,2}\.\d{1,2}\.\d{2,4}
^case[\s\S]*\d{1,2}\.\d{1,2}\.\d{2,4}
((^case)[\s\S]+(\d{1,2}\.\d{1,2}\.\d{2,4}))

note: case insensitive flag is set

I am expecting to get group of each paragraph (case - date).

These expressions capture the only one group with first case to last date

Case No.X.1 I know I am a lucky boy (1) a sda asddasd (ii) sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date .........29.8.1818

Case No.X.1 I know I am a lucky boy (1) a sda asddasd (ii) sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date .........29.8.1818

Case No.X.1 I know I am a lucky boy (1) a sda asddasd (ii) sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date .........29.8.1818

Case No.X.1 I know I am a lucky boy (1) a sda asddasd (ii) sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date .........29.8.1818

I am investigating newline and lookarounds.

Upvotes: 0

Views: 32

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195408

Using flags=re.DOTALL|re.M (regex101):

data = '''Case No.X.1 I know I am a lucky boy (1) a sda asddasd (ii)
sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818

Case No.X.2 I know I am a lucky boy (1) a sda asddasd (ii)
sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818

Case No.X.3 I know I am a lucky boy (1) a sda asddasd (ii)
sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818

Case No.X.4 I know I am a lucky boy (1) a sda asddasd (ii)
sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818'''

import re

for m in re.findall(r'^Case.*?\d{1,2}\.\d{1,2}\.\d{2,4}$', data, flags=re.DOTALL|re.M):
    print(m)
    print('-' * 160)

Prints:

Case No.X.1 I know I am a lucky boy (1) a sda asddasd (ii)
sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Case No.X.2 I know I am a lucky boy (1) a sda asddasd (ii)
sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Case No.X.3 I know I am a lucky boy (1) a sda asddasd (ii)
sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Case No.X.4 I know I am a lucky boy (1) a sda asddasd (ii)
sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818
----------------------------------------------------------------------------------------------------------------------------------------------------------------

Upvotes: 1

Related Questions