ALS_WV
ALS_WV

Reputation: 91

information extraction of NAMED ENTITIES python 2.7

I have a text that looks like:

"<ENAMEX TYPE="PERSON">Edward R. Kimmel</ENAMEX>, one of Admiral <ENAMEX TYPE="PERSON">Jack</ENAMEX>'s two surviving sons and..."

I want an output as the following:

PERSON Edward R. Kimmel

PERSON Jack

Any idea using RegEX?

Thanks a lot

Upvotes: 0

Views: 170

Answers (2)

mkHun
mkHun

Reputation: 5927

Simply use .findall

import re
x = '"<ENAMEX TYPE="PERSON">Edward R. Kimmel</ENAMEX>, one of Admiral <ENAMEX TYPE="PERSON">Jack</ENAMEX>"'
mac = []
mac = re.findall("TYPE=\"PERSON\">(.+?)<",x)


for i in mac:
    print "PERSON "+i

Upvotes: 0

KR29
KR29

Reputation: 433

Did you try beautifulsoup?

from bs4 import BeautifulSoup
txt = """<ENAMEX TYPE="PERSON">Edward R. Kimmel</ENAMEX>, one of Admiral <ENAMEX TYPE="PERSON">Jack</ENAMEX>'s twosurviving sons and..."""
soup = BeautifulSoup(txt,"html.parser")
for i in soup.findAll(attrs={'type' : 'PERSON'}):
    print(i.text)

Upvotes: 2

Related Questions