user2274879
user2274879

Reputation: 349

How to parse and display the content of an Ixml object using IXML

I am having difficult parsing the xml _file below using Ixml:

>>_file= "qv.xml"

file content:

<document reference="suspicious-document00500.txt">
  <feature name="plagiarism" type="artificial" obfuscation="none" this_offset="128" this_length="2503" source_reference="source-document00500.txt" source_offset="138339" source_length="2503"/>
  <feature name="plagiarism" type="artificial" obfuscation="none" this_offset="8593" this_length="1582" source_reference="source-document00500.txt" source_offset="49473" source_length="1582"/>
</document>

Here is my attempt:

>>from lxml.etree import XMLParser, parse
>>parsefile = parse(_file)

>>print parsefile

Output: <lxml.etree._ElementTree object at 0x000000000642E788>

The output is the location of the ixml object, while I am after the actual file content ie

Desired output={'document reference'="suspicious-document00500.txt", 'this_offset': '128', 'obfuscation': 'none', 'source_length': '2503', 'name': 'plagiarism', 'this_length': '2503', 'source_reference': 'source-document00500.txt', 'source_offset': '138339', 'type': 'artificial'}

Any ideas on how to get the desired output? thanks.

Upvotes: 0

Views: 194

Answers (1)

user2963623
user2963623

Reputation: 2295

Here's one way of getting the desired outputs:

from lxml import etree

def main():
    doc = etree.parse('qv.xml')
    root = doc.getroot()
    print root.attrib
    for item in root:
        print item.attrib

if __name__ == "__main__":
    main()

Output:

{'reference': 'suspicious-document00500.txt'}
{'this_offset': '128', 'obfuscation': 'none', 'source_length': '2503', 'name': 'plagiarism', 'this_length': '2503', 'source_reference': 'source-document00500.txt', 'source_offset': '138339', 'type': 'artificial'}
{'this_offset': '8593', 'obfuscation': 'none', 'source_length': '1582', 'name': 'plagiarism', 'this_length': '1582', 'source_reference': 'source-document00500.txt', 'source_offset': '49473', 'type': 'artificial'}

It works fine with the contents you gave. You might want to read thisto see how etree represents xml objects.

Upvotes: 1

Related Questions