GothicAnatomist
GothicAnatomist

Reputation: 156

Python Regex boolean 'or' doesn't select all matches

I'm trying to match a number of sub-strings within a string.

The areas of interest are in the format of:

Sample1: "text text text[One]"
Sample2:"text text text[One/Two]"
Sample3:"text text text[One/Two/Three]"

I'm trying to get a list of the numbers using regex in the following way:

numbers = re.findall('(\[|\/)(\w+)(\/|\])', str)

However, group2 produces:

#Sample1
['One']
#Sample2
['One']
#Sample3
['One','Three']

No matter what, I can't get it to match the second number between a '/' and either a ']' or a '/'. However, I don't understand why it isn't matching '/Two/' as the '/' character is an option in both alternatives.

I've also tried framing it in a different way with the following regex:

re.findall('[\[]?[\/]?(\w+)[\/]?[\]]?', str)

and although it gives me the desired results, it also gives me all the words in the preceding text as well.

Any advice appreciated.

Upvotes: 3

Views: 273

Answers (4)

Sнаđошƒаӽ
Sнаđошƒаӽ

Reputation: 17602

If you are sure that your target strings are always going to be in the format you've shown, then why not just find all the numbers separated by slashes first, and then split the result on /?

Sample3 = "text text text[One/Two/Three]"
re.findall('\[(.*)\]', Sample3)[0].split('/')

Output:

['One', 'Two', 'Three']

Upvotes: 1

developer_hatch
developer_hatch

Reputation: 16224

You can try this regex too:

import re
regex = r"\[.+?\]"
Sample1= "text text text[One]"
Sample2= "text text text[One/Two]"
Sample3= "text text text[One/Two/Three]"
lines=[Sample1,Sample2,Sample3]
subres = [re.findall(r"\[(.+[^\/])\]", s) for s in lines]
result = [res[0].split('/') for res in subres]

print(result)

result:

[['One'], ['One', 'Two'], ['One', 'Two', 'Three']]

Upvotes: 1

tobias_k
tobias_k

Reputation: 82919

Use lookbehind and lookahead so the [, / and ] are not part of the match:

>>> [re.findall('(?<=\[|\/)\w+(?=\/|\])', s) for s in samples]
[['One'], ['One', 'Two'], ['One', 'Two', 'Three']]

This way, the intermediate / can be used for two matches.

Upvotes: 1

Ajax1234
Ajax1234

Reputation: 71461

You can try this:

s = ["text text text[One]", "text text text[One/Two]",  "text text text[One/Two/Three]"]
import re
final_data = [[b.split('/') for b in re.findall('\[(.*?)\]', i)][0] for i in s]

Output:

[['One'], ['One', 'Two'], ['One', 'Two', 'Three']]

Upvotes: 1

Related Questions