Regex pattern to extract tag and its contents

Question

considering this:

input = """YesterdayPeterdrove toNew York"""

how can one use regex patterns to extract:

person: Peter
location: New York

This works well, but I dont want to hard code the tags, they can change:

print re.findall("(.*?)", input)
print re.findall("(.*?)", input)

PyNEwbie · Accepted Answer

Use a tool designed for the work. I happen to like lxml but their are other

>>> minput = """YesterdayPeter Smithdrove toNew York"""
>>> from lxml import html
>>> tree = html.fromstring(minput)
>>> for e in tree.iter():
        print e, e.tag, e.text_content()
        if e.tag() == 'person':          # getting the last name per comment
           last = e.text_content().split()[-1]
           print last


 p YesterdayPeter Smithdrove toNew York
 person Peter Smith
Smith                                            # here is the last name
 location New York

If you are new to Python then you might want to visit this site to get an installer for a number of packages including LXML.

Regex pattern to extract tag and its contents

Answers (2)

Related Questions