Reputation: 95
I'm trying to find any text between a '>' character and a new line, so I came up with this regex:
result = re.search(">(.*)\n", text).group(1)
It works perfectly with only one result, such as:
>test1
(something else here)
Where the result, as intended, is
test1
But whenever there's more than one result, it only shows the first one, like in:
>test1
(something else here)
>test2
(something else here)
Which should give something like
test1\ntest2
But instead just shows
test1
What am I missing? Thank you very much in advance.
Upvotes: 0
Views: 98
Reputation: 3553
You could try:
y = re.findall(r'((?:(?:.+?)(?:(?=[\n\r][^\n\r])\n|))+)', text)
Which returns ['t1\nt2\nt3']
for 't1\nt2\nt3\n'
. If you simply want the string, you can get it by:
s = y[0]
Although it seems much larger than your initial code, it will give you your desired string.
((?:(?:.+?)(?:(?=[\n\r][^\n\r])\n|))+)
is the regex as well as the match.
(?:(?:.+?)(?:(?=[\n\r][^\n\r])\n|))
is the non-capturing group that matches any text followed by a newline, and is repeatedly found one-or-more times by the +
after it.
(?:.+?)
matches the actual words which are then followed by a newline.
(?:(?=[\n\r][^\n\r])\n|)
is a non-capturing conditional group which tells the regex that if the matched text is followed by a newline, then it should match it, provided that the newline is not followed by another newline or carriage return
(?=[\n\r][^\n\r])
is a positive look-ahead which ascertains that the text found is followed by a newline or carriage return, and then some non-newline characters, which combined with the \n|
after it, tells the regex to match a newline.
Granted, after typing this big mess out, the regex is pretty long and complicated, so you would be better off implementing the answers you understand, rather than this answer, which you may not. However, this seems to be the only one-line answer to get the exact output you desire.
Upvotes: 1
Reputation: 271185
re.search
only returns the first match, as documented:
Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance.
To find all the matches, use findall
.
Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found.
Here's an example from the shell:
>>> import re
>>> re.findall(">(.*)\n", ">test1\nxxx>test2\nxxx")
['test1', 'test2']
Edit: I just read your question again and realised that you want "test1\ntest2" as output. Well, just join the list with \n
:
>>> "\n".join(re.findall(">(.*)\n", ">test1\nxxx>test2\nxxx"))
'test1\ntest2'
Upvotes: 2