Marta
Marta

Reputation: 95

Getting data from broken xml in Python

I would like to get data from xml, but it structure seems to be broken.

I have this example URL: https://b2b.snapoutdoor.pl/rest/V1/extendvariantstocart/73478 Which is xml with data about the product.

import requests
import json
from xml.etree import ElementTree
from pprint import pprint

response = requests.get(
    "https://b2b.snapoutdoor.pl/rest/V1/extendvariantstocart/86559",
    headers={"Accept": "application/xml"},
)

node = ElementTree.fromstring(response.content)

data = json.loads(node.text)

this returns dict with four keys:

{'jsonChildsConfig': '{"70259":{"id":"70259","name":"Ski Ultra Merino E - '
                     'black\\/orange","sku":"610306139887","availableQty":6,"regularPrice":69.2367,"finalPrice":69.2367,"promo":false,"discount":0,"bestDiscount":false,"addToCartUrl":"https:\\/\\/b2b.snapoutdoor.pl\\/checkout\\/cart\\/add\\/uenc\\/aHR0cHM6Ly9iMmIuc25hcG91dGRvb3IucGwvcmVzdC9WMS9leHRlbmR2YXJpYW50c3RvY2FydC84NjU1OQ%2C%2C\\/product\\/86559\\/","formKey":"7OWS6VbWucoSg2zg","superAttributes":"36-39 '
                     '","salable":true},"70260":{"id":"70260","name":"Ski '
                     'Ultra Merino E - '
                     'black\\/orange","sku":"610306139894","availableQty":7,"regularPrice":69.2367,"finalPrice":69.2367,"promo":false,"discount":0,"bestDiscount":false,"addToCartUrl":"https:\\/\\/b2b.snapoutdoor.pl\\/checkout\\/cart\\/add\\/uenc\\/aHR0cHM6Ly9iMmIuc25hcG91dGRvb3IucGwvcmVzdC9WMS9leHRlbmR2YXJpYW50c3RvY2FydC84NjU1OQ%2C%2C\\/product\\/86559\\/","formKey":"7OWS6VbWucoSg2zg","superAttributes":"40-43 '
                     '","salable":true},"70261":{"id":"70261","name":"Ski '
                     'Ultra Merino E - '
                     'black\\/orange","sku":"610306139900","availableQty":6,"regularPrice":69.2367,"finalPrice":69.2367,"promo":false,"discount":0,"bestDiscount":false,"addToCartUrl":"https:\\/\\/b2b.snapoutdoor.pl\\/checkout\\/cart\\/add\\/uenc\\/aHR0cHM6Ly9iMmIuc25hcG91dGRvb3IucGwvcmVzdC9WMS9leHRlbmR2YXJpYW50c3RvY2FydC84NjU1OQ%2C%2C\\/product\\/86559\\/","formKey":"7OWS6VbWucoSg2zg","superAttributes":"44-47 '
                     '","salable":true},"99060":{"id":"99060","name":"Ski '
                     'Ultra Merino E - '
                     'black\\/orange","sku":"610306139917","availableQty":3,"regularPrice":69.24,"finalPrice":69.24,"promo":false,"discount":0,"bestDiscount":false,"addToCartUrl":"https:\\/\\/b2b.snapoutdoor.pl\\/checkout\\/cart\\/add\\/uenc\\/aHR0cHM6Ly9iMmIuc25hcG91dGRvb3IucGwvcmVzdC9WMS9leHRlbmR2YXJpYW50c3RvY2FydC84NjU1OQ%2C%2C\\/product\\/86559\\/","formKey":"7OWS6VbWucoSg2zg","superAttributes":"48+ '
                     '","salable":true}}',
 'jsonConfig': 'some data',
 'jsonDefaultPlaceholder': 'https://b2b.snapoutdoor.pl/pub/media/catalog/product/placeholder/',
 'jsonSwatchConfig': 'some data'
}

I'm interested with values of jsonChildsConfig, but when trying to reach keys inside it, I got TypeError: string indices must be integers because the value for jsonChildsConfig is a string.

I would like to get all sku and stock values from sku and availableQty but theirs type is string and it is not possible to get it through

data['jsonChildsConfig']['70259']['sku']

or

data['jsonChildsConfig']['70259']['availableQty'].

I also tried to convert this string to json byt json.loads() but it didn't work.

Could you please help me with it? ๐Ÿ™๐Ÿ™‚

Upvotes: 0

Views: 96

Answers (2)

Massifox
Massifox

Reputation: 4487

To fix your dictionary you need to apply json.loads to all the values โ€‹โ€‹of your dictionary, excluding 'jsonDefaultPlaceholder' which is not in json format:

del data['jsonDefaultPlaceholder']
new_data = {k: json.loads(v) for k, v in data.items() if v}
new_data['jsonChildsConfig']['70259']['sku']

#output: '610306139887'

or if you want to convert the keys that interest you into integer values:

del data['jsonDefaultPlaceholder']
new_data2 = {k: {(int(key) if key.isdigit() else key): val for key,val in json.loads(v).items()} for k, v in data.items() if v}
new_data2['jsonChildsConfig'][70259]['sku']

# output: '610306139887'

Upvotes: 2

abhilb
abhilb

Reputation: 5757

Converting the value of data['jsonChildsConfig'] to dict using json.loads should work

>>> childConfigDetails = json.loads(data['jsonChildsConfig'])
>>> childConfigDetails['70259']['sku']
'610306139887'

Upvotes: 0

Related Questions