Reputation: 409
I'm trying to convert an xml file to csv. The file looks like this :
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<tns:FlxPtn xmlns:ts2c="http://interop.covea.fr/Covea-Flx-TypesS2C-009" xmlns:tns="http://interop.covea.fr/Covea-App-PolRntVie-024" xmlns:rf2c="http://interop.covea.fr/Covea-Referentiel" xmlns:cov="http://interop.covea.fr/Covea-FlxPtn-002" xmlns:fs2c="http://interop.covea.fr/Covea-Flx-EneFncS2C-007" xsi:schemaLocation="http://interop.covea.fr/Covea-Flx-TypesS2C-009 Covea-Flx-TypesS2C-009.xsd http://interop.covea.fr/Covea-App-PolRntVie-024 S2C_XSD_VIGERIRVES_V10.0_024_MMA_124.xsd http://interop.covea.fr/Covea-App-PolRntVie-024 S2C_XSD_VIGERIRVES_V10.0_024.xsd http://interop.covea.fr/Covea-Referentiel Covea-Referentiel.xsd http://interop.covea.fr/Covea-FlxPtn-002 Covea-FlxPtn-002.xsd http://interop.covea.fr/Covea-Flx-EneFncS2C-007 Covea-Flx-EneFncS2C-007.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<cov:DonEneTch>
<cov:IdVrsEne>002</cov:IdVrsEne>
<cov:IdFlx>1V400220191231VIGERIRVESMMA1</cov:IdFlx>
<cov:TsCraFlx>2020-01-13 10.02.13.000000</cov:TsCraFlx>
<cov:IdEmtFlx>MMA</cov:IdEmtFlx>
<cov:IdRctFlx>MMA</cov:IdRctFlx>
<cov:TyFlx>NOTIF</cov:TyFlx>
<cov:TyTrtFlx>1</cov:TyTrtFlx>
<cov:AquFlx>0</cov:AquFlx>
<cov:EvrExu>PRODUCTION</cov:EvrExu>
<cov:IdApnEmt>MMA</cov:IdApnEmt>
<cov:AcnApl>VIGERIRVES</cov:AcnApl>
<cov:IdVrsFlx>124</cov:IdVrsFlx>
<cov:IdTrtEmt>Rentes Vie</cov:IdTrtEmt>
<cov:IdUtl>TTBATCH</cov:IdUtl>
<cov:VrsCbl></cov:VrsCbl>
<cov:ChpLbr>1V400220191231VIGERIRVESMMA1377900</cov:ChpLbr>
</cov:DonEneTch>
<tns:DonMet>
<tns:DonEneFnc>
<fs2c:CodSocJur>1V4002</fs2c:CodSocJur>
<fs2c:DatArr>20191231</fs2c:DatArr>
<fs2c:TypFiColl>VIGERIRVES</fs2c:TypFiColl>
<fs2c:TimStmCreFic>2020-01-13 10.02.13.000000</fs2c:TimStmCreFic>
<fs2c:CodEns>MMA</fs2c:CodEns>
</tns:DonEneFnc>
<tns:PolRntVie>
<tns:NumEnr>20191290</tns:NumEnr>
<tns:NumPol>050000111901</tns:NumPol>
<tns:PMVie>997.75</tns:PMVie>
</tns:PolRntVie>
<tns:PolRntVie>
<tns:NumEnr>20191291</tns:NumEnr>
<tns:NumPol>050000112002</tns:NumPol>
<tns:PMVie>4385.15</tns:PMVie>
</tns:PolRntVie>
What I would like to extract is the last part informations "NumEnr", "NumPol", "PMVie".
I tried to adapt some code samples as below but I'm not familiar enough to xml to make it work.
import pandas as pd
from xml.etree import ElementTree as et
import csv
libname = "C:/Users/a61787/Documents/"
tree = ET.parse(libname+'rente.xml')
with open(libname+'b.csv','w',newline='',encoding='utf8') as sitescope_data:
csvwriter = csv.writer(sitescope_data)
col_names = 'Numenr NumPol PMVie'.split()
csvwriter.writerow(col_names)
for event in tree.findall('tns:DonMet/tns:PolRntVie'):
event_data = ['' if (e:=event.find(col)) is None else e.text for col in col_names]
csvwriter.writerow(event_data)
dataframe = pd.read_csv('b.csv',encoding='utf8')
print(dataframe.shape)
Which give me a empty DF with only the col names.
Finally what i want to obtain is the following table :
Numenr NumPol PMVie
20191290 050000111901 997.75
20191291 050000112002 4385.15
If you have any idea to do that, I would be very grateful
Upvotes: 1
Views: 306
Reputation: 46759
You could do this using beautifulsoup
as follows:
from bs4 import BeautifulSoup
import csv
with open('rente.xml') as f_input:
soup = BeautifulSoup(f_input, "lxml")
with open('b.csv', 'w', newline='', encoding='utf-8') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(['Numenr', 'NumPol', 'PMVie'])
for tns in soup.find_all("tns:polrntvie"):
csv_output.writerow(tns.find(entry).text for entry in ['tns:numenr', 'tns:numpol', 'tns:pmvie'])
This would give you a b.csv
file containing:
Numenr,NumPol,PMVie
20191290,050000111901,997.75
20191291,050000112002,4385.15
If items are missing:
from bs4 import BeautifulSoup
import csv
with open('rente.xml') as f_input:
soup = BeautifulSoup(f_input, "lxml")
with open('b.csv', 'w', newline='', encoding='utf-8') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(['Numenr', 'NumPol', 'PMVie'])
for tns in soup.find_all("tns:polrntvie"):
row = []
for entry in ['tns:numenr', 'tns:numpol', 'tns:pmvie']:
try:
row.append(tns.find(entry).text)
except AttributeError:
row.append('')
csv_output.writerow(row)
Upvotes: 1