Reputation: 27
I have a string like following:
element = ['ABCa4.daf<<tag1>>permission : wiadsfth.accedsafsds.INTERNET<<tag2>>',]
I am trying with Regular Expression 'findall' to output only the uppercases at the end of string (before tag2) Here is what I did:
re.findall('<<tag1>>' +"(.*?)"+ '<<tag2>>' , element)
but it comes out with other letters before 'INTERNET', give that these letters before INTERNET change all the time, I can't tag them, too.
can anybody sheds a light? Thank you so much!
Upvotes: 2
Views: 47
Reputation: 11347
Just match "any sequence of uppercases, followed by <<tag2>>
.
re.findall(r'[A-Z]+(?=<<tag2>>)', element[0])
or
re.findall(r'[A-Z]+(?=[^<>]*<<tag2>>)', element[0])
to handle stuff like INTERNET foobar <<tag2>>
.
Finally, to match any sequence of A-Z
at any position between start and end tags, you're going to need this little monster:
rr = r"""(?x)
[A-Z]+
(?=
(?:
(?! <<tag1>>) .
) *
<<tag2>>
)
"""
element = ['ABC xyz DEF <<tag1>> permission : INTERNET foo XYZ bar <<tag2>>',]
print re.findall(rr, element[0]) # ['INTERNET', 'XYZ']
Upvotes: 1
Reputation: 473873
You need to allow any symbols before the [A-Z]+
:
>>> import re
>>> s = 'ABCa4.daf<<tag1>>permission : wiadsfth.accedsafsds.INTERNET<<tag2>>'
>>> re.findall('<<tag1>>.*?([A-Z]+)<<tag2>>', s)
['INTERNET']
.*?
is a non-greedy match for any character. [A-Z]+
matches 1 or more upper case letters.
Upvotes: 4