Pandas: Parse xml column and split into separate rows

Question

I have a dataframe that looks like this (1 sample row)

    Title   Date    Supervisor  Nightshift  XML
    NaN 27/10/2021  Michael Myres   No


The dataframe has one column that an contains nested XML (below)
    

    
    
        
            **1203**
            **Bodyshop**
            **1234**
            **1234**
            **On Tools**
            **Turnaround**
            **Me**
            <_x0034_4508821-eb2d-45be-b408-5db4df886890 type="System.String">&lt;?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?&gt;&lt;RepeaterData&gt;&lt;Version /&gt;&lt;Items&gt;&lt;Item&gt;&lt;control_EmployeeName type=&quot;System.String&quot;&gt;**Elliot**&lt;/control_EmployeeName&gt;&lt;control_NHours type=&quot;System.Int32&quot;&gt;**1**&lt;/control_NHours&gt;&lt;control_PHours type=&quot;System.Int32&quot;&gt;**2**&lt;/control_PHours&gt;&lt;control_DHours type=&quot;System.Int32&quot;&gt;**3**&lt;/control_DHours&gt;&lt;/Item&gt;&lt;Item&gt;&lt;control_EmployeeName type=&quot;System.String&quot;&gt;**Ryan** &lt;/control_EmployeeName&gt;&lt;control_NHours type=&quot;System.Int32&quot;&gt;**1**&lt;/control_NHours&gt;&lt;control_PHours type=&quot;System.Int32&quot;&gt;**2**&lt;/control_PHours&gt;&lt;control_DHours type=&quot;System.Int32&quot;&gt;**4**&lt;/control_DHours&gt;&lt;/Item&gt;&lt;Item&gt;&lt;control_EmployeeName type=&quot;System.String&quot;&gt;**Elliot**&lt;/control_EmployeeName&gt;&lt;control_NHours type=&quot;System.Int32&quot;&gt;**1**&lt;/control_NHours&gt;&lt;control_PHours type=&quot;System.Int32&quot;&gt;**1**&lt;/control_PHours&gt;&lt;control_DHours type=&quot;System.Int32&quot;&gt;1&lt;/control_DHours&gt;&lt;/Item&gt;&lt;/Items&gt;&lt;/RepeaterData&gt;
            &lt;?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?&gt;&lt;RepeaterData&gt;&lt;Version /&gt;&lt;Items&gt;&lt;Item&gt;&lt;NonProductiveTime type=&quot;System.String&quot;&gt;Additional duties&lt;/NonProductiveTime&gt;&lt;NonProductiveTimeMinutes type=&quot;System.Double&quot;&gt;**15**&lt;/NonProductiveTimeMinutes&gt;&lt;/Item&gt;&lt;/Items&gt;&lt;/RepeaterData&gt;
        
    


I would like to parse the XML column and split the xml ** values ** into separate rows
Expected result
       Date      Supervisor Nightshift  TestReference Discipline   WON  PurchaseOrder LabourType    WorkType TestEngineer EmployeeName  Nhours  Phours  Dhours
0 2021-10-27  Michael Myres         No           1203   Bodyshop  1234           1234   On Tools  Turnaround           Me       Elliot       1       2       3
1 2021-10-27  Michael Myres         No           1203   Bodyshop  1234           1234   On Tools  Turnaround           Me         Ryan       1       2       4
2 2021-10-27  Michael Myres         No           1203   Bodyshop  1234           1234   On Tools  Turnaround           Me         John       1       1       1

I have tried solution below but it just yields
    Version Items   Version Items   Version Items   Version Items   Version Items   Version Items
    None    None    None    None    None    None    None    None    None    None    None    None


import pandas as pd 
import xml.etree.ElementTree as ET

df = pd.read_csv('SharePointDataPull.csv')

xml_data = df['XML'][0] # test row 
root = ET.XML(xml_data)  # Parse XML

for i, child in enumerate(root):
    data.append([subchild.text for subchild in child])
    cols.append(child.tag)

df1 = pd.DataFrame(data).T  # Write in DF and transpose it
df1.columns = cols  # Update column names
df1

Thanks for your help!

Pandas: Parse xml column and split into separate rows

Answers (1)

Related Questions