Reputation: 13
I am trying to convert multiple csv files for(two for now) into xml using Element tree but I am not getting exact output. Please guide me with more efficient approach. PS:I'm a beginner here.
import csv
import xml.etree.ElementTree as ET
#from bs4 import BeautifulSoup
root = ET.Element('Policy')
with open("policy.csv","r") as p, open("Att.csv","r") as a, open("rider.csv","r") as r:
csv_p = csv.reader(p)
header_p = next(csv_p)
csv_a = csv.reader(a)
header_a = next(csv_a)
csv_r = csv.reader(r)
header_r = next(csv_r)
for row in csv_p:
pid = row[0]
print("\n",pid)
for col in range(len(header_p)):
ET.SubElement(root, header_p[col]).text = str(row[col])
for childrow in csv_a:
if(pid == childrow[0]):
print("Match found")
child = ET.SubElement(root,"child")
for col_a in range(len(header_a)):
ET.SubElement(child, header_a[col_a]).text = str(childrow[col_a])
for tailrow in csv_r:
if(childrow[1] == tailrow[0]):
print("tail found",tailrow[0])
tail = ET.SubElement(child,"tail")
for col_r in range(len(header_r)):
ET.SubElement(tail, header_r[col_r]).text = str(tailrow[col_r])
r.seek(0)
a.seek(0)
tree = ET.tostring(root, encoding="UTF-8")
#print(BeautifulSoup(tree, "xml").prettify())
with open("Output.xml", "wb") as f:
f.write(tree)
with open('Output.xml', 'r') as f:
print("\n\n",f.read())
The output looks like below but as you can see some tags being repeated becuase they are redundant in files i'm reading :
Policy.csv:
Pid,Name,Date
101,Life In,3Jan2017
102,Mobile,8Aug2018
Att.csv:
PId,AId,Name
101,9001,Pune
101,9002,Mumbai
102,9003,Delhi
rider.csv:
AId,RID,Name
9001,10001,Ramesh
9001,10002,Suresh
9002,10003,Rahul
9002,10004,Kirti
Output:
<Policy>
<Pid>101</Pid>
<child>
<PId>101</PId>
<tail><AId>9001</AId>
<RID>10001</RID>
<Name>Ramesh</Name>
</tail>
<tail>
<AId>9001</AId>
<RID>10002</RID>
<Name>Suresh</Name>
</tail>
<AId>9001</AId>
<Name>Pune</Name>
</child>
<child>
<PId>101</PId>
<tail><AId>9002</AId>
<RID>10003</RID>
<Name>Rahul</Name>
</tail>
<tail><AId>9002</AId>
<RID>10004</RID>
<Name>Kirti</Name>
</tail>
<AId>9002</AId>
<Name>Mumbai</Name>
</child>
<Name>Life In</Name>
<Date>3Jan2017</Date>
</Policy>
Instance of Desired Output:
<Policy>
<Pid>101</Pid>
<child>
<AId>9001</AId>
<tail>
<RID>10001</RID>
<Name>Ramesh</Name>
</tail>
<tail>
<RID>10002</RID>
<Name>Suresh</Name>
</tail>
<Name>Pune</Name>
</child>
<Name>Life In</Name>
<Date>3Jan2017</Date>
</Policy>
Upvotes: 0
Views: 702
Reputation: 52858
If you are able to use lxml, here's an example of what I was talking about in the comments.
Hopefully I have the logic correct:
policy
is based on a row in Policy.csv. It's uniquely identified by Pid
.child
in policy
is based on a row in Att.csv that has a matching PId
.tail
in child
is based on a row in rider.csv that has a matching AId
.The first thing I would do is convert the csv into a temporary XML format.
Since the header values of your csv files would be valid element names, I'd go ahead and create elements based on those values.
If your csv files might have header values that aren't valid element names, you could use a generic element name and store the header value in an attribute. (I can change my example if needed.)
Then I'd transform the temporary XML and handle all of the grouping there. Since lxml only supports XSLT 1.0, we'd have to use Muenchian Grouping.
Example...
Python
import csv
from os import path
from lxml import etree
def csv2xml(file):
result = etree.Element(path.splitext(file)[0])
with open(file) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
row_elem = etree.SubElement(result, "row")
for entry in row:
entry_elem = etree.SubElement(row_elem, entry.strip().lower())
entry_elem.text = row.get(entry).strip()
return result
csv_files = ["policy.csv", "att.csv", "rider.csv"]
temp_xml = etree.Element("policies")
for csv_file in csv_files:
xml = csv2xml(csv_file)
temp_xml.append(xml)
xslt = etree.parse("transform.xsl")
xml_output = etree.ElementTree(temp_xml).xslt(xslt)
print(etree.tostring(xml_output, pretty_print=True).decode())
XSLT (transform.xsl)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="policy" match="policy/row" use="pid"/>
<xsl:key name="att" match="att/row" use="pid"/>
<xsl:key name="rider" match="rider/row" use="aid"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/*">
<xsl:copy>
<xsl:apply-templates select="policy"/>
</xsl:copy>
</xsl:template>
<xsl:template match="policy">
<xsl:for-each select="row[count(.|key('policy', pid)[1])=1]">
<policy>
<xsl:apply-templates select="pid"/>
<xsl:apply-templates select="key('att', pid)"/>
<xsl:apply-templates select="name|date"/>
</policy>
</xsl:for-each>
</xsl:template>
<xsl:template match="att/row">
<child>
<xsl:apply-templates select="aid"/>
<xsl:apply-templates select="key('rider', aid)"/>
<xsl:apply-templates select="name"/>
</child>
</xsl:template>
<xsl:template match="rider/row">
<tail>
<xsl:apply-templates select="rid|name"/>
</tail>
</xsl:template>
</xsl:stylesheet>
The Python will print this output:
<policies>
<policy>
<pid>101</pid>
<child>
<aid>9001</aid>
<tail>
<rid>10001</rid>
<name>Ramesh</name>
</tail>
<tail>
<rid>10002</rid>
<name>Suresh</name>
</tail>
<name>Pune</name>
</child>
<child>
<aid>9002</aid>
<tail>
<rid>10003</rid>
<name>Rahul</name>
</tail>
<tail>
<rid>10004</rid>
<name>Kirti</name>
</tail>
<name>Mumbai</name>
</child>
<name>Life In</name>
<date>3Jan2017</date>
</policy>
<policy>
<pid>102</pid>
<child>
<aid>9003</aid>
<name>Delhi</name>
</child>
<name>Mobile</name>
<date>8Aug2018</date>
</policy>
</policies>
Hopefully this helps.
Upvotes: 1