Mario
Mario

Reputation: 95

Regex only finds results once

I'm trying to find any text between a '>' character and a new line, so I came up with this regex:

result = re.search(">(.*)\n", text).group(1)

It works perfectly with only one result, such as:

>test1
(something else here)

Where the result, as intended, is

test1

But whenever there's more than one result, it only shows the first one, like in:

>test1
(something else here)
>test2
(something else here)

Which should give something like

test1\ntest2

But instead just shows

test1

What am I missing? Thank you very much in advance.

Upvotes: 0

Views: 98

Answers (2)

Robo Mop
Robo Mop

Reputation: 3553

You could try:

y = re.findall(r'((?:(?:.+?)(?:(?=[\n\r][^\n\r])\n|))+)', text)

Which returns ['t1\nt2\nt3'] for 't1\nt2\nt3\n'. If you simply want the string, you can get it by:

s = y[0]

Although it seems much larger than your initial code, it will give you your desired string.

Explanation -

((?:(?:.+?)(?:(?=[\n\r][^\n\r])\n|))+) is the regex as well as the match.

(?:(?:.+?)(?:(?=[\n\r][^\n\r])\n|)) is the non-capturing group that matches any text followed by a newline, and is repeatedly found one-or-more times by the + after it.

(?:.+?) matches the actual words which are then followed by a newline.
(?:(?=[\n\r][^\n\r])\n|) is a non-capturing conditional group which tells the regex that if the matched text is followed by a newline, then it should match it, provided that the newline is not followed by another newline or carriage return
(?=[\n\r][^\n\r]) is a positive look-ahead which ascertains that the text found is followed by a newline or carriage return, and then some non-newline characters, which combined with the \n| after it, tells the regex to match a newline.

Granted, after typing this big mess out, the regex is pretty long and complicated, so you would be better off implementing the answers you understand, rather than this answer, which you may not. However, this seems to be the only one-line answer to get the exact output you desire.

Upvotes: 1

Sweeper
Sweeper

Reputation: 271185

re.search only returns the first match, as documented:

Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance.

To find all the matches, use findall.

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found.

Here's an example from the shell:

>>> import re
>>> re.findall(">(.*)\n", ">test1\nxxx>test2\nxxx")
['test1', 'test2']

Edit: I just read your question again and realised that you want "test1\ntest2" as output. Well, just join the list with \n:

>>>  "\n".join(re.findall(">(.*)\n", ">test1\nxxx>test2\nxxx"))
'test1\ntest2'

Upvotes: 2

Related Questions