Bebio
Bebio

Reputation: 409

Convert XML to csv file with Python

I'm trying to convert an xml file to csv. The file looks like this :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<tns:FlxPtn xmlns:ts2c="http://interop.covea.fr/Covea-Flx-TypesS2C-009" xmlns:tns="http://interop.covea.fr/Covea-App-PolRntVie-024" xmlns:rf2c="http://interop.covea.fr/Covea-Referentiel" xmlns:cov="http://interop.covea.fr/Covea-FlxPtn-002" xmlns:fs2c="http://interop.covea.fr/Covea-Flx-EneFncS2C-007" xsi:schemaLocation="http://interop.covea.fr/Covea-Flx-TypesS2C-009 Covea-Flx-TypesS2C-009.xsd http://interop.covea.fr/Covea-App-PolRntVie-024 S2C_XSD_VIGERIRVES_V10.0_024_MMA_124.xsd http://interop.covea.fr/Covea-App-PolRntVie-024 S2C_XSD_VIGERIRVES_V10.0_024.xsd http://interop.covea.fr/Covea-Referentiel Covea-Referentiel.xsd http://interop.covea.fr/Covea-FlxPtn-002 Covea-FlxPtn-002.xsd http://interop.covea.fr/Covea-Flx-EneFncS2C-007 Covea-Flx-EneFncS2C-007.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <cov:DonEneTch>
    <cov:IdVrsEne>002</cov:IdVrsEne>
    <cov:IdFlx>1V400220191231VIGERIRVESMMA1</cov:IdFlx>
    <cov:TsCraFlx>2020-01-13 10.02.13.000000</cov:TsCraFlx>
    <cov:IdEmtFlx>MMA</cov:IdEmtFlx>
    <cov:IdRctFlx>MMA</cov:IdRctFlx>
    <cov:TyFlx>NOTIF</cov:TyFlx>
    <cov:TyTrtFlx>1</cov:TyTrtFlx>
    <cov:AquFlx>0</cov:AquFlx>
    <cov:EvrExu>PRODUCTION</cov:EvrExu>
    <cov:IdApnEmt>MMA</cov:IdApnEmt>
    <cov:AcnApl>VIGERIRVES</cov:AcnApl>
    <cov:IdVrsFlx>124</cov:IdVrsFlx>
    <cov:IdTrtEmt>Rentes Vie</cov:IdTrtEmt>
    <cov:IdUtl>TTBATCH</cov:IdUtl>
    <cov:VrsCbl></cov:VrsCbl>
    <cov:ChpLbr>1V400220191231VIGERIRVESMMA1377900</cov:ChpLbr>
  </cov:DonEneTch>
  <tns:DonMet>
    <tns:DonEneFnc>
      <fs2c:CodSocJur>1V4002</fs2c:CodSocJur>
      <fs2c:DatArr>20191231</fs2c:DatArr>
      <fs2c:TypFiColl>VIGERIRVES</fs2c:TypFiColl>
      <fs2c:TimStmCreFic>2020-01-13 10.02.13.000000</fs2c:TimStmCreFic>
      <fs2c:CodEns>MMA</fs2c:CodEns>
    </tns:DonEneFnc>

    <tns:PolRntVie>
      <tns:NumEnr>20191290</tns:NumEnr>
      <tns:NumPol>050000111901</tns:NumPol>
      <tns:PMVie>997.75</tns:PMVie>
    </tns:PolRntVie>
    <tns:PolRntVie>
      <tns:NumEnr>20191291</tns:NumEnr>
      <tns:NumPol>050000112002</tns:NumPol>
      <tns:PMVie>4385.15</tns:PMVie>
    </tns:PolRntVie>

What I would like to extract is the last part informations "NumEnr", "NumPol", "PMVie".

I tried to adapt some code samples as below but I'm not familiar enough to xml to make it work.

import pandas as pd
from xml.etree import ElementTree as et
import csv

libname = "C:/Users/a61787/Documents/"

tree = ET.parse(libname+'rente.xml')
with open(libname+'b.csv','w',newline='',encoding='utf8') as sitescope_data:
    csvwriter = csv.writer(sitescope_data)
    col_names = 'Numenr NumPol PMVie'.split()
    csvwriter.writerow(col_names)
    for event in tree.findall('tns:DonMet/tns:PolRntVie'):
        event_data = ['' if (e:=event.find(col)) is None else e.text for col in col_names]
        csvwriter.writerow(event_data)

dataframe = pd.read_csv('b.csv',encoding='utf8')
print(dataframe.shape)

Which give me a empty DF with only the col names.

Finally what i want to obtain is the following table :

Numenr      NumPol          PMVie
20191290    050000111901    997.75
20191291    050000112002    4385.15

If you have any idea to do that, I would be very grateful

Upvotes: 1

Views: 306

Answers (1)

Martin Evans
Martin Evans

Reputation: 46759

You could do this using beautifulsoup as follows:

from bs4 import BeautifulSoup
import csv

with open('rente.xml') as f_input:
    soup = BeautifulSoup(f_input, "lxml")

with open('b.csv', 'w', newline='', encoding='utf-8') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['Numenr', 'NumPol', 'PMVie'])
    
    for tns in soup.find_all("tns:polrntvie"):
        csv_output.writerow(tns.find(entry).text for entry in ['tns:numenr', 'tns:numpol', 'tns:pmvie'])

This would give you a b.csv file containing:

Numenr,NumPol,PMVie
20191290,050000111901,997.75
20191291,050000112002,4385.15

If items are missing:

from bs4 import BeautifulSoup
import csv

with open('rente.xml') as f_input:
    soup = BeautifulSoup(f_input, "lxml")

with open('b.csv', 'w', newline='', encoding='utf-8') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['Numenr', 'NumPol', 'PMVie'])
    
    for tns in soup.find_all("tns:polrntvie"):
        row = []
        
        for entry in ['tns:numenr', 'tns:numpol', 'tns:pmvie']:
            try:
                row.append(tns.find(entry).text)
            except AttributeError:
                row.append('')
                
        csv_output.writerow(row)

Upvotes: 1

Related Questions