Reputation: 91
I have a text that looks like:
"<ENAMEX TYPE="PERSON">Edward R. Kimmel</ENAMEX>, one of Admiral <ENAMEX TYPE="PERSON">Jack</ENAMEX>'s two surviving sons and..."
I want an output as the following:
PERSON Edward R. Kimmel
PERSON Jack
Any idea using RegEX?
Thanks a lot
Upvotes: 0
Views: 170
Reputation: 5927
Simply use .findall
import re
x = '"<ENAMEX TYPE="PERSON">Edward R. Kimmel</ENAMEX>, one of Admiral <ENAMEX TYPE="PERSON">Jack</ENAMEX>"'
mac = []
mac = re.findall("TYPE=\"PERSON\">(.+?)<",x)
for i in mac:
print "PERSON "+i
Upvotes: 0
Reputation: 433
Did you try beautifulsoup?
from bs4 import BeautifulSoup
txt = """<ENAMEX TYPE="PERSON">Edward R. Kimmel</ENAMEX>, one of Admiral <ENAMEX TYPE="PERSON">Jack</ENAMEX>'s twosurviving sons and..."""
soup = BeautifulSoup(txt,"html.parser")
for i in soup.findAll(attrs={'type' : 'PERSON'}):
print(i.text)
Upvotes: 2