Shenanigator
Shenanigator

Reputation: 1066

Having some challenges parsing XML with Python and xml.etree.ElementTree

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<feed xml:base="https://share.corp.com/sites/CPIBudget/_vti_bin/ListData.svc/" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns="http://www.w3.org/2005/Atom">
  <title type="text">Tbl_Projects_Tableau</title>
  <id>https://share.corp.com/sites/CPIBudget/_vti_bin/ListData.svc/Tbl_Projects_Tableau/</id>
  <updated>2018-07-25T21:27:59Z</updated>
  <link rel="self" title="Tbl_Projects_Tableau" href="Tbl_Projects_Tableau" />
  <entry m:etag="W/&quot;8&quot;">
    <id>https://share.corp.com/sites/CPIBudget/_vti_bin/ListData.svc/Tbl_Projects_Tableau(1)</id>
    <title type="text"></title>
    <updated>2018-06-14T17:15:27Z</updated>
    <author>
      <name />
    </author>
    <link rel="edit" title="Tbl_Projects_TableauItem" href="Tbl_Projects_Tableau(1)" />
    <link rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/FBN_ID" type="application/atom+xml;type=entry" title="FBN_ID" href="Tbl_Projects_Tableau(1)/FBN_ID" />
    <link rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/CreatedBy" type="application/atom+xml;type=entry" title="CreatedBy" href="Tbl_Projects_Tableau(1)/CreatedBy" />
    <link rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/ModifiedBy" type="application/atom+xml;type=entry" title="ModifiedBy" href="Tbl_Projects_Tableau(1)/ModifiedBy" />
    <category term="Microsoft.SharePoint.DataService.Tbl_Projects_TableauItem" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
    <content type="application/xml">
      <m:properties>
        <d:FBN_IDId m:type="Edm.Int32">6</d:FBN_IDId>
        <d:Title m:null="true" />
        <d:PROJECT_NAME>Project Swoop</d:PROJECT_NAME>
        <d:Cluster>ABC</d:Cluster>
        <d:PROJECT_SITE>ABC9</d:PROJECT_SITE>
        <d:PROJECT_ORIGINALAMT m:type="Edm.Double">500000</d:PROJECT_ORIGINALAMT>
        <d:PROJECT_ORG>Nookie</d:PROJECT_ORG>
        <d:PROJECT_GROUP>Smooth</d:PROJECT_GROUP>
        <d:c__OldID m:type="Edm.Double">1</d:c__OldID>
        <d:ContentTypeID>0x0100FD279BEBCF3C4F45BB75D6147D315C09</d:ContentTypeID>
        <d:Id m:type="Edm.Int32">1</d:Id>
        <d:ContentType>Item</d:ContentType>
        <d:Modified m:type="Edm.DateTime">2018-06-14T17:15:27</d:Modified>
        <d:Created m:type="Edm.DateTime">2018-06-14T16:58:50</d:Created>
        <d:CreatedById m:type="Edm.Int32">2</d:CreatedById>
        <d:ModifiedById m:type="Edm.Int32">2</d:ModifiedById>
        <d:Owshiddenversion m:type="Edm.Int32">8</d:Owshiddenversion>
        <d:Version>1.0</d:Version>
        <d:Path>/sites/SmoothBudget/Lists/Projects_Tableau1</d:Path>
      </m:properties>
    </content>
  </entry>
 </feed>

here is a sample of the XML I am trying to parse to CSV.

here is my code so far:

import config
import csv
import pymysql
import requests
import xml.etree.ElementTree as ET
from requests_ntlm import HttpNtlmAuth

ssoUsername = config.username
ssoPassword = config.password

f = open(path+csvFile,'w',newline='')
csvwriter = csv.writer(f)
column_headers = ['FBN','Project_Name','Cluster','Site','OP2_USD','Type','Group']
csvwriter.writerow(column_headers)

rows = []
r2 = requests.get(project_url, auth=HttpNtlmAuth('ANT\\'+ssoUsername,ssoPassword), verify=False)
projectData = r2.content
etree2 = ET.fromstring(projectData)

#print(etree2.findall('.****'))

for element in etree2.findall(".****") :
    print(element.find('{http://schemas.microsoft.com/ado/2007/08/dataservices}FBN_IDId'))
    fbnKey2 = element.find('{http://schemas.microsoft.com/ado/2007/08/dataservices}FBN_IDId')
    FBN = fbnMap.get(fbnKey2)

so at this point I cannot get the .text of the '{http://schemas.microsoft.com/ado/2007/08/dataservices}FBN_IDId' element. No matter what xPath I try it always gives me NoneType has not attribute text error.

Here are the results of print(etree2.findall('.****'))

[<Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}ContentTypeID' at 0x000001C4B7F31BD8>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}Id' at 0x000001C4B7F31C28>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}ContentType' at 0x000001C4B7F31C78>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}Modified' at 0x000001C4B7F31CC8>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}Created' at 0x000001C4B7F31D18>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}CreatedById' at 0x000001C4B7F31D68>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}ModifiedById' at 0x000001C4B7F31DB8>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}Owshiddenversion' at 0x000001C4B7F31E08>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}Version' at 0x000001C4B7F31E58>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}Path' at 0x000001C4B7F31EA8>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}FBN_IDId' at 0x000001C4B7F3ABD8>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}Title' at 0x000001C4B7F3AB88>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}PROJECT_NAME' at 0x000001C4B7F3AB38>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}Cluster' at 0x000001C4B7F3AA98>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}PROJECT_SITE' at 0x000001C4B7F3AA48>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}PROJECT_ORIGINALAMT' at 0x000001C4B7F3A9F8>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}PROJECT_ORG' at 0x000001C4B7F3A908>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}PROJECT_GROUP' at 0x000001C4B7F3A868>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}c__OldID' at 0x000001C4B7F3A8B8>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}ContentTypeID' at 0x000001C4B7F3A778>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}Id' at 0x000001C4B7F3A728>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}ContentType' at 0x000001C4B7F3A6D8>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}Modified' at 0x000001C4B7F3A138>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}Created' at 0x000001C4B7F3A048>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}CreatedById' at 0x000001C4B7F3A638>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}ModifiedById' at 0x000001C4B7F3A5E8>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}Owshiddenversion' at 0x000001C4B7F3A548>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}Version' at 0x000001C4B7F3A598>, <Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}Path' at 0x000001C4B7F3A4F8>]

It seems like I should be able to get the FBNIDId, but the best I have been able to do is get

None
<Element '{http://schemas.microsoft.com/ado/2007/08/dataservices}FBN_IDId' at 0x000001C4B7F3ABD8>

and that results in the none type error. The only time I was able to get it to work at all was to do :

for element in etree2.findall(".//{http://schemas.microsoft.com/ado/2007/08/dataservices}FBN_IDId") :
        fbnKey2 = element.text
        FBN = fbnMap.get(fbnKey2)

but if I do that then I have to do that for each element I need and then figure out how to combine them all into a single row and then loop through for adding all the rows and that seems wrong.

suggestions?

Upvotes: 1

Views: 206

Answers (2)

Luke Woodward
Luke Woodward

Reputation: 65054

In your loop

for element in etree2.findall(".****") :

you are iterating through all elements four levels down. For each such element you then search within its child elements for those named FBN_IDId. This will find any such elements that are five levels down.

However, there aren't any such elements. They only exist four levels down.

Perhaps you want to iterate through all elements three levels down and look for child elements of these, named FBN_IDId? This could be done by removing one of the *s in your loop:

for element in etree2.findall(".***") :

However this loop would also find the empty <name /> element.

Perhaps it would be better to write

for element in etree2.findall(".//{http://schemas.microsoft.com/ado/2007/08/dataservices/metadata}properties") :

which loops through all <m:properties> elements at any depth.

Upvotes: 0

Danny_ds
Danny_ds

Reputation: 11406

Maybe you could use the full path:

/feed/entry/content/m:properties/d:FBN_IDId

or:

/feed/entry/content/{http://schemas.microsoft.com/ado/2007/08/dataservices/metadata}properties/{http://schemas.microsoft.com/ado/2007/08/dataservices}FBN_IDId

Upvotes: 1

Related Questions