user2091880
user2091880

Reputation: 21

How do i parse the following doctype in xml?

I have an xml string with the following doctype syntax. how do I parse it? I should be able to get each of the filenames in the SYSTEM tag.

'''<xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE config SYSTEM "ncfg_config.dtd"
[
    <!ENTITY vlan_map_type     SYSTEM "types/a.xml">
    <!ENTITY oui_type            SYSTEM "types/b.xml">
    <!ENTITY provisioning_profile  SYSTEM "c.xml">
    <!ENTITY vlan_name_or_list  SYSTEM "types/d.xml">
    <!ENTITY vlan_name_or_num   SYSTEM "types/e.xml">
    <!ENTITY interface_list     SYSTEM "types/f.xml">
    <!ENTITY mac_limit_type     SYSTEM "types/g.xml">
]>'''

Upvotes: 0

Views: 345

Answers (2)

Timothy
Timothy

Reputation: 4487

If the format is strict to your example, then using regex would be easier:

import re

xml = '''<xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE config SYSTEM "ncfg_config.dtd"
[
    <!ENTITY vlan_map_type     SYSTEM "types/a.xml">
    <!ENTITY oui_type            SYSTEM "types/b.xml">
    <!ENTITY provisioning_profile  SYSTEM "c.xml">
    <!ENTITY vlan_name_or_list  SYSTEM "types/d.xml">
    <!ENTITY vlan_name_or_num   SYSTEM "types/e.xml">
    <!ENTITY interface_list     SYSTEM "types/f.xml">
    <!ENTITY mac_limit_type     SYSTEM "types/g.xml">
]>'''


file_names = re.findall(r'<!ENTITY .* SYSTEM "(.*?)">',xml)
for name in file_names:
    print name

Output:

types/a.xml
types/b.xml
c.xml
types/d.xml
types/e.xml
types/f.xml
types/g.xml  

Upvotes: 1

Reuben
Reuben

Reputation: 5736

Hve you tried HTMLParser ?

Have a look at this python doc

Upvotes: 0

Related Questions