Reputation: 156
I'm trying to match a number of sub-strings within a string.
The areas of interest are in the format of:
Sample1: "text text text[One]"
Sample2:"text text text[One/Two]"
Sample3:"text text text[One/Two/Three]"
I'm trying to get a list of the numbers using regex in the following way:
numbers = re.findall('(\[|\/)(\w+)(\/|\])', str)
However, group2 produces:
#Sample1
['One']
#Sample2
['One']
#Sample3
['One','Three']
No matter what, I can't get it to match the second number between a '/' and either a ']' or a '/'. However, I don't understand why it isn't matching '/Two/' as the '/' character is an option in both alternatives.
I've also tried framing it in a different way with the following regex:
re.findall('[\[]?[\/]?(\w+)[\/]?[\]]?', str)
and although it gives me the desired results, it also gives me all the words in the preceding text as well.
Any advice appreciated.
Upvotes: 3
Views: 273
Reputation: 17602
If you are sure that your target strings are always going to be in the format you've shown, then why not just find all the numbers separated by slashes first, and then split the result on /
?
Sample3 = "text text text[One/Two/Three]"
re.findall('\[(.*)\]', Sample3)[0].split('/')
Output:
['One', 'Two', 'Three']
Upvotes: 1
Reputation: 16224
You can try this regex too:
import re
regex = r"\[.+?\]"
Sample1= "text text text[One]"
Sample2= "text text text[One/Two]"
Sample3= "text text text[One/Two/Three]"
lines=[Sample1,Sample2,Sample3]
subres = [re.findall(r"\[(.+[^\/])\]", s) for s in lines]
result = [res[0].split('/') for res in subres]
print(result)
result:
[['One'], ['One', 'Two'], ['One', 'Two', 'Three']]
Upvotes: 1
Reputation: 82919
Use lookbehind and lookahead so the [
, /
and ]
are not part of the match:
>>> [re.findall('(?<=\[|\/)\w+(?=\/|\])', s) for s in samples]
[['One'], ['One', 'Two'], ['One', 'Two', 'Three']]
This way, the intermediate /
can be used for two matches.
Upvotes: 1
Reputation: 71461
You can try this:
s = ["text text text[One]", "text text text[One/Two]", "text text text[One/Two/Three]"]
import re
final_data = [[b.split('/') for b in re.findall('\[(.*?)\]', i)][0] for i in s]
Output:
[['One'], ['One', 'Two'], ['One', 'Two', 'Three']]
Upvotes: 1