Reputation: 6371
Given a unicode object with the following text:
a
b
c
d
e
aaaa
bbbb
cccc
dddd
eeee
I'd like to get the second group of lines, in other words, every line after the blank one. This is the code I've used:
text = ... # the previous text
exp = u'a\nb\nc\nd\n\e\n{2}(.*\n){5}'
matches = re.findall(exp, text, re.U)
This will only retrieve the last line, indeed. What could I do to get the last five ones?
Upvotes: 3
Views: 84
Reputation: 336158
You're repeating the capturing group itself, which overwrites each match with the next repetition.
If you do this
exp = ur'a\nb\nc\nd\n\e\n{2}((?:.*\n){5})'
you get the five lines together.
You can't get to the individual matches unless you spell out the groups manually:
exp = ur'a\nb\nc\nd\n\e\n{2}(.*\n)(.*\n)(.*\n)(.*\n)(.*\n)'
Upvotes: 4
Reputation: 882
if your searched text has some kind of limitation on the number of characters for this first part which you don't want, why not set a search for only words with more than X letters like:
^[a-z]{2,}
This will get every word bigger than 2 characters.
You can control as:
Upvotes: 0
Reputation: 142156
Why not just:
text[text.index('\n\n') + 2:].splitlines()
# ['aaaa', 'bbbb', 'cccc', 'dddd', 'eeee']
Upvotes: 2