sidnical
sidnical

Reputation: 459

Python Parse XML file for certain lines and output the line to Text widget

I need to search a windows msinfo file (.nfo) for certain lines and print them to a Text widget. I can print(line) ever line in the file and I can output every line to the Text widget but as soon as I try to specify lines to output it stops working. I assume this is because the file is an XML but the XML parsing tools I see for python seem to look for lines like data=blah. The entries im looking for look like this when I open them in a txt editor:

    <Category name="Disks">
<Data>
<Item><![CDATA[Description]]></Item>
<Value><![CDATA[Disk drive]]></Value>
</Data>
<Data>
<Item><![CDATA[Manufacturer]]></Item>
<Value><![CDATA[(Standard disk drives)]]></Value>
</Data>
<Data>
<Item><![CDATA[Model]]></Item>
<Value><![CDATA[TOSHIB  MK1652GSX SCSI Disk Device]]></Value>
</Data>
<Data>
<Item><![CDATA[Bytes/Sector]]></Item>
<Value><![CDATA[512]]></Value>
</Data>
<Data>
<Item><![CDATA[Media Loaded]]></Item>
<Value><![CDATA[Yes]]></Value>
</Data>
<Data>
<Item><![CDATA[Media Type]]></Item>
<Value><![CDATA[Fixed hard disk]]></Value>
</Data>
<Data>
<Item><![CDATA[Partitions]]></Item>
<Value><![CDATA[2]]></Value>
</Data>
<Data>
<Item><![CDATA[SCSI Bus]]></Item>
<Value><![CDATA[1]]></Value>
</Data>
<Data>
<Item><![CDATA[SCSI Logical Unit]]></Item>
<Value><![CDATA[0]]></Value>
</Data>
<Data>
<Item><![CDATA[SCSI Port]]></Item>
<Value><![CDATA[0]]></Value>
</Data>
<Data>
<Item><![CDATA[SCSI Target ID]]></Item>
<Value><![CDATA[0]]></Value>
</Data>
<Data>
<Item><![CDATA[Sectors/Track]]></Item>
<Value><![CDATA[63]]></Value>
</Data>
<Data>
<Item><![CDATA[Size]]></Item>
<Value><![CDATA[149.05 GB (160,039,272,960 bytes)]]></Value>
</Data>
<Data>
<Item><![CDATA[Total Cylinders]]></Item>
<Value><![CDATA[19,457]]></Value>
</Data>
<Data>
<Item><![CDATA[Total Sectors]]></Item>
<Value><![CDATA[312,576,705]]></Value>
</Data>
<Data>
<Item><![CDATA[Total Tracks]]></Item>
<Value><![CDATA[4,961,535]]></Value>
</Data>
<Data>
<Item><![CDATA[Tracks/Cylinder]]></Item>
<Value><![CDATA[255]]></Value>
</Data>
<Data>
<Item><![CDATA[Partition]]></Item>
<Value><![CDATA[Disk #1, Partition #0]]></Value>
</Data>
<Data>
<Item><![CDATA[Partition Size]]></Item>
<Value><![CDATA[117.19 GB (125,830,301,184 bytes)]]></Value>
</Data>
<Data>
<Item><![CDATA[Partition Starting Offset]]></Item>
<Value><![CDATA[32,256 bytes]]></Value>
</Data>
<Data>
<Item><![CDATA[Partition]]></Item>
<Value><![CDATA[Disk #1, Partition #1]]></Value>
</Data>
<Data>
<Item><![CDATA[Partition Size]]></Item>
<Value><![CDATA[31.85 GB (34,200,714,240 bytes)]]></Value>
</Data>
<Data>
<Item><![CDATA[Partition Starting Offset]]></Item>
<Value><![CDATA[125,830,333,440 bytes]]></Value>
</Data>
<Data>

I found a post asking for what I want but the solution doesn't work. The ET.parse is not found:

import xml.etree as ET
file = 'D:\\MsInfo\\msinfo.nfo'
tree = ET.parse(file)
root = tree.getroot()

for element in root.findall('Category'):
    value = element.find('Data')
    for child in value:
        print(child.tag ,":",child.text)

When using the above I get this:

"C:\Program Files (x86)\Python35-32\python.exe" "D:/MY STUFF/Programming/Python/testing.py" Traceback (most recent call last): File "D:/MY STUFF/Programming/Python/testing.py", line 3, in tree = ET.parse(file) AttributeError: module 'xml.etree' has no attribute 'parse'

Process finished with exit code 1

This is a snippet from my code:

try:
    u = find("msinfo.nfo", s)
    for i in u:
        cpfotxt.insert('end', i + "\n")
        cpfotxt.yview(END)
        cpfotxt.insert('end', "================================= \n")
        with open(i, "r") as f:
            r = f.readlines()
            for line in r:
                if "Model" in line:
                    cpfotxt.insert('end', line + "\n")

If I remove the if "Model" in line: then it will dump everything into the Text widget fine.

This is how they look when opened normally with on windows:

enter image description here

Any advice on how to pull lines I need from an nfo/XML file?

Also, when printing lines from an xml the font is bigger and double spaced. How can I make the line print the same way it would from a normal txt file?

Upvotes: 1

Views: 10913

Answers (2)

drez90
drez90

Reputation: 814

So you need to understand the structure of the XML and then use the actual tags you're looking for instead of 'Data'

    item = element.find('Item') 
    print(item.tag ,":",item.text)
    value = element.find('Value') 
    print(value.tag ,":",value.text)

Your actual problem is that you need to change the import you use.

import xml.etree.ElementTree as ET

https://docs.python.org/2/library/xml.etree.elementtree.html

Edit: with the way that's structured, you can get a list of Data elements by saying

for data in root.findall('Data'):
    item = data.find('Item') 
    print(item.tag ,":",item.text)
    value = data.find('Value') 
    print(value.tag ,":",value.text)

Now, understand that if that "Data" tag is not at the root level, then you need to root.find() until you can get to it. In other words, if those "Data" tags are enclosed in some parent tags, you need to root.find("Parent Tag"), hope you get the gist of it

Edit2: Looked at my own msinfo.nfo file and this worked:

disks = root.find(".//Category[@name='Disks']")

for disk in disks:
    item = disk.find('Item')
    print(item.tag ,":",item.text)
    value = disk.find('Value')
    print(value.tag ,":",value.text)

Note: This uses XPath syntax to find the element, which is only available in ElementTree1.3 (Python 2.7 and higher). You can also brute force it by following the structure of the XML and traversing through the tree until you get to Disks. The path was System Summary->Components->Storage->Disks and under Disks were those Data elements with Item and Value as children.

Upvotes: 1

Iman Mirzadeh
Iman Mirzadeh

Reputation: 13550

Here is my code with your sample data, I know it could be written better but I think this solves your problem :)
you have to find the root(xml) and then iterate it's texts ! you can also use other methods like iterfind for better solutions

xml_file  = "<xml><Item><![CDATA[Model]]></Item><Value><![CDATA[TOSHIB  MK1652GSX SCSI Disk Device]]></Value></xml>"
from xml.etree import ElementTree
root = ElementTree.fromstring(xml_file)

start = root.itertext()

while True:
    try:
        print start.next()
    except StopIteration:
        break

Here is the output:

>>>Model
>>>TOSHIB  MK1652GSX SCSI Disk Device

Upvotes: 0

Related Questions