Scrape XML file with Python

Question

I have been trying to scrape an XML file to copy content from 2 tags, Code and Source only. The xml file looks as follows:

I'm only getting it right to scrape the first code. Below is the code. Can anyone help?

import xml.etree.ElementTree as ET
import csv

tree = ET.parse("data.xml")
csv_fname = "data.csv"
root = tree.getroot()

f = open(csv_fname, 'w')
csvwriter = csv.writer(f)
count = 0
head = ['Code', 'Source']

csvwriter.writerow(head)

for time in root.findall('Instruments'):
    row = []
    job_name = time.find('Instrument').find('Code').text
    row.append(job_name)
    job_name_1 = time.find('Instrument').find('Source').text
    row.append(job_name_1)
    csvwriter.writerow(row)
f.close()

Anubhav Singh · Accepted Answer

The XML file given by you in the post is invalid. Check by pasting the file here. https://www.w3schools.com/xml/xml_validator.asp

The valid xml I assume would be

To print values in Code and Source tags.

from lxml import etree
root = etree.parse('data.xml').getroot()
instruments = root.find('Instruments')
instrument = instruments.findall('Instrument')
for grandchild in instrument:
    code, source = grandchild.find('Code'), grandchild.find('Source')
    print (code.text), (source.text)

Scrape XML file with Python

Answers (2)

Related Questions