Nasser
Nasser

Reputation: 2140

Python split tags based on regular expression

I would like to split the following tag <b size=5 alt=ref> as follows:

Open tag: b
Parm: size=5
Parm: alt=ref

However, I tried the following code to split the tag as groups but it didn't work:

temp = '<b size=5 alt=ref>'
matchObj = re.search(r"(\S*)\s*(\S*)", temp)
print 'Open tag: ' + matchObj.groups()

My plan is to split the tag into groups and then print the first group as open tag and the rest as Parm. Can you please suggest any idea that helps me solving this problem?

Note that I read the tags from an html file but I mentioned here an example of open tag and I showed the part of the code that I am stuck with.

Thanks

Upvotes: 0

Views: 215

Answers (2)

Mayur Koshti
Mayur Koshti

Reputation: 1852

>>> import re
>>> temp = '<b size=5 alt=ref>'
>>> resList  = re.findall("\S+", temp.replace("<","").replace(">",""))
>>> myDict = {}
>>> myDict["Open tag:"] = [resList[0]]
>>> myDict["Parm:"] = resList[1:]
>>> myDict
{'Open tag:': ['b'], 'Parm:': ['size=5', 'alt=ref']}

Upvotes: 0

LetzerWille
LetzerWille

Reputation: 5658

tag_names = ["Open tag:","Parm:","Parm:"]
import re  
# split on <,>,white space, and remove empty strings at
# the start and at the end of the resulting list. 
tags = re.split(r'[<> ]','<b size=5 alt=ref>')[1:-1]
# zip tag_names list and with list of tags
print(list(zip(tag_names, tags)))

[('Open tag:', 'b'), ('Parm:', 'size=5'), ('Parm:', 'alt=ref')]

Upvotes: 2

Related Questions