Reputation: 39
I'm trying to extract some data from following XML file.
<?xml version="1.0" encoding="utf-8"?>
<go-home-1:GOHOMEV1 xmlns:go-home-1="https://sample.com/GO-HOME-V1">
<HOMEV1FileHeader>
<FileCreationTimestamp>2020-02-15T08:29:22+01:00</FileCreationTimestamp>
<FileType>AB716</FileType>
<SGO>YIFG</SGO>
</HOMEV1FileHeader>
<OI>
<ON>YIFG4</ON>
<CI>HYU</CI>
<NL>
<NT>
<GOCode>HYU34</GOCode>
<NTName>HYUFFT - 11</NTName>
<NTData>
<RIS>
<RI>
<EDC>2020-01-18</EDC>
<E4NS>
<GNS>
<RD>
<NR>
<CC>9012</CC>
<NDC>411</NDC>
<SRng>
<SRngStart>000</SRngStart>
<SRngStop>999</SRngStop>
</SRng>
</NR>
</RD>
<RD>
<NR>
<CC>834</CC>
<NDC>101</NDC>
<SRng>
<SRngStart>150</SRngStart>
<SRngStop>295</SRngStop>
</SRng>
</NR>
</RD>
</GNS>
</E4NS>
<E2NS>
<MCC>111</MCC>
<MNC>222</MNC>
</E2NS>
<E2G>
<MGT_CC>9012</MGT_CC>
<MGT_NC>4113</MGT_NC>
</E2G>
</RI>
</RIS>
</NTData>
</NT>
</NL>
</OI>
</go-home-1:GOHOMEV1>
My expected output is like below, having SGO as first field.
My attempt is like below (taking ideas from here Getting all children of a node using xml.etree.ElementTree)
but I'm getting some errors or empty lists (for sgo = root.find()...
and A = root.findall()...
) for which I'm stuck. Thanks for any help.
import xml.etree.ElementTree as ET
import glob, os
filename = "file.xml"
namespaces = {
"go-home-1": "https://sample.com/GO-HOME-V1"
}
root = ET.parse(filename).getroot()
# For this sgo = root.find()... I get ERROR << AttributeError: 'NoneType' object has no attribute 'text'>>
sgo = root.find("go-home-1:HOMEV1FileHeader/"
"go-home-1:SGO", namespaces).text
### For below I'm getting empty list A = [] and I don't know why.
A = root.findall(
"go-home-1:OI/go-home-1:NL/go-home-1:NT[1]/go-home-1:NTData/go-home-1:RIS/go-home-1:RI/go-home-1:E4NS/"
"go-home-1:GNS/"
"go-home-1:RD/"
"go-home-1:NR", namespaces)
for item1 in A:
Result = [sgo]
cc = item1.find("go-home-1:CC", namespaces).text
ndc = item1.find("go-home-1:NDC", namespaces).text
Result.append(cc)
Result.append(ndc)
B = item1.findall(
"go-home-1:OI/go-home-1:NL/go-home-1:NT[1]/go-home-1:NTData/go-home-1:RIS/go-home-1:RI/go-home-1:E4NS/"
"go-home-1:GNS/"
"go-home-1:RD/"
"go-home-1:NR/"
"go-home-1:SRng", namespaces)
for item2 in B:
RngStart = item2.find("go-home-1:SRngStart", namespaces).text
RngStop = item2.find("go-home-1:SRngStop", namespaces).text
Result.append(RngStart)
Result.append(RngStop)
print(Result)
Upvotes: 1
Views: 331
Reputation: 24930
In this particular xml and considering the expected output, namespaces aren't really necessary. Additionally, the best way, I think, to present your output is using a dataframe.
import pandas as pd
columns = ['SGO', 'MCC','MNC','MGT_CC','MGT_NC','CC','NDC','SRngStart','SRngStop']
sgo = root.find('.//SGO').text
mcc = root.find('.//MCC').text
mnc = root.find('.//MNC').text
mgt_cc = root.find('.//MGT_CC').text
mgt_nc = root.find('.//MGT_NC').text
rows = []
for entry in root.findall('.//RD'):
row = []
cc = entry.find('.//CC').text
ndc = entry.find('.//NDC').text
srngstart = entry.find('.//SRngStart').text
srngstop = entry.find('.//SRngStop').text
row.extend([sgo,mcc,mnc,mgt_cc,mgt_nc,cc,ndc,srngstart,srngstop])
rows.append(row)
df = pd.DataFrame(rows, columns=columns)
df
Output:
SGO MCC MNC MGT_CC MGT_NC CC NDC SRngStart SRngStop
0 YIFG 111 222 9012 4113 9012 411 000 999
1 YIFG 111 222 9012 4113 834 101 150 295
Upvotes: 1