mrutyunjay
mrutyunjay

Reputation: 8350

regular expression to extract sections

My string looks like :

[abc]
line_one xxxxxxxxxxxxxx
line_two xxxxxxxxxxxxxx
[pqr]
line_four xxxxxxxxxxxxxx
line_five xxxxxxxxxxxxxx
[xyz]
line_six  xxxxxxxxxxxxxx
line_seven  xxxxxxxxxxxxxx

I am trying to fetch these lines section wise. tried below regular expressions but no luck.

result = re.compile(r'(\[.+\])')
details = result.findall(string)

with this i am getting section names, then i tried :

result = re.compile(r'(\[.+\]((\n)(.+))+)')

Any suggestion??

Upvotes: 1

Views: 211

Answers (3)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89639

With split:

re.split(r'\n*(?=\[)', s)

or

re.split(r'(?m)\n*^(?=\[)', s)

Upvotes: 1

vks
vks

Reputation: 67998

(\[[^\]]*\][^\[]+)(?:\s|$)

Try this.See demo.This will give you the lines section wise.

http://regex101.com/r/mP1wO4/1

import re
p = re.compile(ur'(\[[^\]]*\][^\[]+)(?:\s|$)')
test_str = u"[abc]\nline_one xxxxxxxxxxxxxx\nline_two xxxxxxxxxxxxxx\n[pqr]\nline_four xxxxxxxxxxxxxx\nline_five xxxxxxxxxxxxxx\n[xyz]\nline_six xxxxxxxxxxxxxx\nline_seven xxxxxxxxxxxxxx"

re.findall(p, test_str)

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174874

Use re.findall function. You need to include \n inside the positive lookahead , so that it won't newline character which was present just before to the [] block.

>>> m = re.findall(r'(?s)(?:^|\n)(\[[^\]]*\].*?)(?=\n\[[^\]]*\]|$)', s)
>>> m
['[abc]\nline_one xxxxxxxxxxxxxx\nline_two xxxxxxxxxxxxxx', '[pqr]\nline_four xxxxxxxxxxxxxx\nline_five xxxxxxxxxxxxxx', '[xyz]\nline_six  xxxxxxxxxxxxxx\nline_seven  xxxxxxxxxxxxxx']
>>> for i in m:
    print(i)


[abc]
line_one xxxxxxxxxxxxxx
line_two xxxxxxxxxxxxxx
[pqr]
line_four xxxxxxxxxxxxxx
line_five xxxxxxxxxxxxxx
[xyz]
line_six  xxxxxxxxxxxxxx
line_seven  xxxxxxxxxxxxxx

Upvotes: 1

Related Questions