Mia
Mia

Reputation: 181

Update XML text content based on path and text values stored in csv efficiently

I have a csv file (csvlist.csv) that contains the paths of the XML elements I need to change in the first column. The texts I need to change original node values to are given in columns 2 - 10,000 on wards.

Path                                                            Edit1       Edit2       Edit3       Edit4       Edit5          ----  Edit1000
".//data/country[@name="Singapore"]/gdpnp[@month="08"]/state",  5.2e-015,   2e-05,      8e-06,      9e-04,      0.4e-05,   
".//data/country[@name="Peru"]/gdppc[@month="06"]/region",      0.04,       0.02,       0.15,       3.24,       0.98,                                                 

I would like to replace the text of the elements of the original XML file (NoEdit.xml), based on the paths in column 1, by values in each subsequent column and name accordingly e.g. XML based on column 2 values will be named Edit2.xml.

import csv
import xml.etree.ElementTree as ET
tree = ET.parse('NoEdit.xml')      
with open('csvlist.csv', 'rb') as csvlist:
    reader = csv.reader(csvlist, delimiter=',')
for x in range(1, 1000):
    for row in reader:
        if reader.line_num == 1: continue # skip the row of headers
        for data in tree.findall(row[0]):
            data.text = row[(x)]
            tree.write('Edit(x).xml')

Based on help on this forum q1 q2 I have gotten this far @ the code below. I get the error:

KeyError: '".//data/country[@name="'

When I give a fixed path to remove this error I still get error on findall or I just don't get the right xml.

I would appreciate any help regards direction with this. Please feel free to suggest alternate methods of doing this as well. Thanks

Upvotes: 1

Views: 486

Answers (1)

Birei
Birei

Reputation: 36272

First of all, you should provide a reproducible example, to help other users to help you. I've done some of the work for you and I've created a test csv file, like:

Path,                                             Edit1,       Edit2
".//first",  5.2e-015,   2e-05
".//second",      0.04,       0.02

Note that I added commas to the header, because you missed them, and I don't know if it was intentional or typing error.

Also I've created a small xml file:

<root>
    <first>1</first>
    <second>2</second>
    <third>3</third>
</root>

And the script:

import csv
from lxml import etree
import sys

xmldata = open(sys.argv[2], newline='').read();

with open(sys.argv[1], newline='') as csvfile:
    for i, pivoted_row in enumerate(zip(*csv.reader(csvfile, delimiter=','))):
        if i == 0:
            xpaths = pivoted_row
            continue
        pivoted_row = [c.strip() for c in pivoted_row]
        tree = etree.fromstring(xmldata)
        with open(pivoted_row[0] + ".xml", 'wb') as outfile:
            for j in range(1, len(xpaths)):
                tree.xpath(xpaths[j])[0].text = pivoted_row[j]
            outfile.write(etree.tostring(tree))

The key part is to pivot the csv so I can handle all the data to write to the same file at once. So this way the first column (xpaths) will be first row, so I save them in a variable, that traverse for each other row.

Run it like:

python3 script.py csvfile xmlfile

It creates two files, Edit1.xml and Edit2.xml, with following content:

==> Edit1.xml <==
<root>
    <first>5.2e-015</first>
    <second>0.04</second>
    <third>3</third>
</root>
==> Edit2.xml <==
<root>
    <first>2e-05</first>
    <second>0.02</second>
    <third>3</third>
</root>

I hope it can be useful and set you in the good way to solve your problem.

Upvotes: 1

Related Questions