Reputation: 277
The UK National Gas system publishes a mass of data that can be access from a SOAP server and an example of the returned data (for LNG) is shown below. I've written the code to generate the request and to handle the response but am tripping up on how to extract the returned information. The aim would be to upload the data into a backend database or into a Pandas dataframe.
In previous code, I've simply traversed the XML using XPATH and then iterated over the tag and extracted out the child data. Thus, I was hoping to extract:
GetPublicationDataWMResult, ApplicableAt, ApplicableFor, Value, ...
LNG Stock Level,2016-03-13T15:00:07Z, 2016-03-12T00:00:00Z, 7050.42286, ...
LNG Capacity,2016-03-13T15:00:07Z, 2016-03-12T00:00:00Z, 6515042480, ...
Having tried to use XPATH to traverse the children (/Envelope/Body/GetPublicationDataWMResponse/GetPublicationDataWMResult/) it is failing.
The logic works if I sanitize the code by adding a series of string removals but that's sub-optimal and bound to break in the future.
EXAMPLE CODE:
import requests
from lxml import objectify
def getXML():
toDate = "2016-03-12"
fromDate = "2016-03-12"
dateType = "gasday"
url="http://marketinformation.natgrid.co.uk/MIPIws-public/public/publicwebservice.asmx"
headers = {'content-type': 'application/soap+xml; charset=utf-8'}
body ="""<soap12:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap12="http://www.w3.org/2003/05/soap-envelope">
<soap12:Body>
<GetPublicationDataWM xmlns="http://www.NationalGrid.com/MIPI/">
<reqObject>
<LatestFlag>Y</LatestFlag>
<ApplicableForFlag>Y</ApplicableForFlag>
<ToDate>%s</ToDate>
<FromDate>%s</FromDate>
<DateType>%s</DateType>
<PublicationObjectNameList>
<string>LNG Stock Level</string>
<string>LNG, Daily Aggregated Available Capacity, D+1</string>
</PublicationObjectNameList>
</reqObject>
</GetPublicationDataWM>
</soap12:Body>
</soap12:Envelope>""" % (toDate, fromDate,dateType)
response = requests.post(url,data=body,headers=headers)
return response.content
root = objectify.fromstring(getXML())
Returned XML:
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope
xmlns:soap="http://www.w3.org/2003/05/soap-envelope"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<GetPublicationDataWMResponse
xmlns="http://www.NationalGrid.com/MIPI/">
<GetPublicationDataWMResult>
<CLSMIPIPublicationObjectBE>
<PublicationObjectName>LNG Stock Level</PublicationObjectName>
<PublicationObjectData>
<CLSPublicationObjectDataBE>
<ApplicableAt>2016-03-13T15:00:07Z</ApplicableAt>
<ApplicableFor>2016-03-12T00:00:00Z</ApplicableFor>
<Value>7050.42286</Value>
<GeneratedTimeStamp>2016-03-13T15:56:00Z</GeneratedTimeStamp>
<QualityIndicator></QualityIndicator>
<Substituted>N</Substituted>
<CreatedDate>2016-03-13T15:56:28Z</CreatedDate>
</CLSPublicationObjectDataBE>
</PublicationObjectData>
</CLSMIPIPublicationObjectBE>
<CLSMIPIPublicationObjectBE>
<PublicationObjectName>LNG Capacity</PublicationObjectName>
<PublicationObjectData>
<CLSPublicationObjectDataBE>
<ApplicableAt>2016-03-12T15:30:00Z</ApplicableAt>
<ApplicableFor>2016-03-12T00:00:00Z</ApplicableFor>
<Value>6515042480</Value>
<GeneratedTimeStamp>2016-03-12T16:00:00Z</GeneratedTimeStamp>
<QualityIndicator></QualityIndicator>
<Substituted>N</Substituted>
<CreatedDate>2016-03-12T16:00:20Z</CreatedDate>
</CLSPublicationObjectDataBE>
</PublicationObjectData>
</CLSMIPIPublicationObjectBE>
</GetPublicationDataWMResult>
</GetPublicationDataWMResponse>
</soap:Body>
</soap:Envelope>
Upvotes: 3
Views: 4128
Reputation: 89285
This is a FAQ in XML+XPath topic that involves XML with default namespace.
XML element where default namespace is declared and its descendant elements without prefix inherits the same default namespace implicitly. And in XPath expression, to reference element in namespace you need to use prefix that has been mapped to the corresponding namespace URI. Using lxml
the codes will be about like the following :
root = etree.fromstring(getXML())
# map prefix 'd' to the default namespace URI
ns = { 'd': 'http://www.NationalGrid.com/MIPI/'}
publication_objects = root.xpath('//d:CLSMIPIPublicationObjectBE', namespaces=ns)
for obj in publication_objects:
name = obj.find('d:PublicationObjectName', ns).text
data = obj.find('d:PublicationObjectData/d:CLSPublicationObjectDataBE', ns)
applicable_at = data.find('d:ApplicableAt', ns).text
applicable_for = data.find('d:ApplicableFor', ns).text
# todo: extract other relevant data and process as needed
Upvotes: 1
Reputation: 7941
Using your existing code I just added this:
res= getXML()
from bs4 import BeautifulSoup
soup = BeautifulSoup(res, 'html.parser')
searchTerms= ['PublicationObjectName','ApplicableAt','ApplicableFor','Value']
# LNG Stock Level,2016-03-13T15:00:07Z, 2016-03-12T00:00:00Z, 7050.42286, ...
for st in searchTerms:
print st+'\t',
print soup.find(st.lower()).contents[0]
Output:
PublicationObjectName LNG Stock Level
ApplicableAt 2016-03-13T15:00:07Z
ApplicableFor 2016-03-12T00:00:00Z
Value 7050.42286
Upvotes: 1