zzhandles
zzhandles

Reputation: 27

how to output uppercases with regex using python

I have a string like following:

element = ['ABCa4.daf<<tag1>>permission : wiadsfth.accedsafsds.INTERNET<<tag2>>',]

I am trying with Regular Expression 'findall' to output only the uppercases at the end of string (before tag2) Here is what I did:

re.findall('<<tag1>>' +"(.*?)"+ '<<tag2>>' , element)

but it comes out with other letters before 'INTERNET', give that these letters before INTERNET change all the time, I can't tag them, too.

can anybody sheds a light? Thank you so much!

Upvotes: 2

Views: 47

Answers (2)

gog
gog

Reputation: 11347

Just match "any sequence of uppercases, followed by <<tag2>>.

re.findall(r'[A-Z]+(?=<<tag2>>)', element[0])

or

re.findall(r'[A-Z]+(?=[^<>]*<<tag2>>)', element[0])

to handle stuff like INTERNET foobar <<tag2>>.

Finally, to match any sequence of A-Z at any position between start and end tags, you're going to need this little monster:

rr = r"""(?x)
    [A-Z]+
    (?=
        (?:
            (?! <<tag1>>) .
        ) *
        <<tag2>>
    )
"""

element = ['ABC xyz DEF <<tag1>> permission : INTERNET foo XYZ bar <<tag2>>',]
print re.findall(rr, element[0])  # ['INTERNET', 'XYZ']

Upvotes: 1

alecxe
alecxe

Reputation: 473873

You need to allow any symbols before the [A-Z]+:

>>> import re
>>> s = 'ABCa4.daf<<tag1>>permission : wiadsfth.accedsafsds.INTERNET<<tag2>>'
>>> re.findall('<<tag1>>.*?([A-Z]+)<<tag2>>', s)
['INTERNET']

.*? is a non-greedy match for any character. [A-Z]+ matches 1 or more upper case letters.

Upvotes: 4

Related Questions