Reputation: 135
Using Python 3.5.2, what is the best way to convert a string into a list of dictionaries?
I'm scraping a site, with the following being returned as a list of length 1:
(Formatted for readability)
[
{"variation_id":573,"variation_is_visible":true,"variation_is_active":true,"is_purchasable":true,"display_price":1099,"display_regular_price":1099,"attributes":{"attribute_pa_size":"king"},"image_src":"","image_link":"","image_title":"","image_alt":"","image_caption":"","image_srcset":"","image_sizes":"","price_html":"<span class=\"price\"><span class=\"woocommerce-Price-amount amount\"><span class=\"woocommerce-Price-currencySymbol\">R<\/span>1,099.00<\/span><\/span>","availability_html":"<p class=\"stock in-stock\">2 in stock<\/p>","sku":"6006239211693","weight":" kg","dimensions":"","min_qty":1,"max_qty":2,"backorders_allowed":false,"is_in_stock":true,"is_downloadable":false,"is_virtual":false,"is_sold_individually":"no","variation_description":""},
{"variation_id":574,"variation_is_visible":true,"variation_is_active":true,"is_purchasable":true,"display_price":989,"display_regular_price":989,"attributes":{"attribute_pa_size":"queen"},"image_src":"","image_link":"","image_title":"","image_alt":"","image_caption":"","image_srcset":"","image_sizes":"","price_html":"<span class=\"price\"><span class=\"woocommerce-Price-amount amount\"><span class=\"woocommerce-Price-currencySymbol\">R<\/span>989.00<\/span><\/span>","availability_html":"<p class=\"stock in-stock\">2 in stock<\/p>","sku":"6006239211686","weight":" kg","dimensions":"","min_qty":1,"max_qty":2,"backorders_allowed":false,"is_in_stock":true,"is_downloadable":false,"is_virtual":false,"is_sold_individually":"no","variation_description":""},
{"variation_id":575,"variation_is_visible":true,"variation_is_active":true,"is_purchasable":true,"display_price":949,"display_regular_price":949,"attributes":{"attribute_pa_size":"double"},"image_src":"","image_link":"","image_title":"","image_alt":"","image_caption":"","image_srcset":"","image_sizes":"","price_html":"<span class=\"price\"><span class=\"woocommerce-Price-amount amount\"><span class=\"woocommerce-Price-currencySymbol\">R<\/span>949.00<\/span><\/span>","availability_html":"<p class=\"stock in-stock\">2 in stock<\/p>","sku":"6006239211679","weight":" kg","dimensions":"","min_qty":1,"max_qty":2,"backorders_allowed":false,"is_in_stock":true,"is_downloadable":false,"is_virtual":false,"is_sold_individually":"no","variation_description":""}
]
I tried converting that to a str, assigning it to 's' and then using json.loads(s), but that didn't work.
I'd like to have a list object whereby I can access values with something like:
for item in form_data_returned:
print item['variation_id'] # prints 573 574 575
Thanks
Upvotes: 1
Views: 1045
Reputation: 236
Use the re module to preprocess the string, then use the json module to parse it into a dictionary.
Assuming you have the data converted to a string, and you are know that certain rules apply to the content*, you can try the following:
str = '...'
escaped = re.sub('(?<=[^,:{}])(\\\")(?=[^,:{}])','\\"',str)
dict = json.loads(escaped)
The regular expression (?<=[^,:{}])(\\\")(?=[^,:{}])
will parse the string and identify all characters "
that are not preceded by ',' , ':' , '{', '}'
or followed by the same, so that the "
in the strings in the data can be escaped properly.
*by rules i mean, that you have to know, that the used regular expression finds the correct characters - if the data source can provide that consistency, the code above should work (extend the (?<=[^,:{}])
and (?=[^,:{}])
parts with the necessary characters to match all data
Upvotes: 0
Reputation: 3318
from collections import defaultdict
# Set aliases for `true` and `false` in the output so
# we won't get NameError exceptions thrown.
true = True
false = False
raw = [
{"variation_id":573,"variation_is_visible":true,"variation_is_active":true,"is_purchasable":true,"display_price":1099,"display_regular_price":1099,"attributes":{"attribute_pa_size":"king"},"image_src":"","image_link":"","image_title":"","image_alt":"","image_caption":"","image_srcset":"","image_sizes":"","price_html":"<span class=\"price\"><span class=\"woocommerce-Price-amount amount\"><span class=\"woocommerce-Price-currencySymbol\">R<\/span>1,099.00<\/span><\/span>","availability_html":"<p class=\"stock in-stock\">2 in stock<\/p>","sku":"6006239211693","weight":" kg","dimensions":"","min_qty":1,"max_qty":2,"backorders_allowed":false,"is_in_stock":true,"is_downloadable":false,"is_virtual":false,"is_sold_individually":"no","variation_description":""},
{"variation_id":574,"variation_is_visible":true,"variation_is_active":true,"is_purchasable":true,"display_price":989,"display_regular_price":989,"attributes":{"attribute_pa_size":"queen"},"image_src":"","image_link":"","image_title":"","image_alt":"","image_caption":"","image_srcset":"","image_sizes":"","price_html":"<span class=\"price\"><span class=\"woocommerce-Price-amount amount\"><span class=\"woocommerce-Price-currencySymbol\">R<\/span>989.00<\/span><\/span>","availability_html":"<p class=\"stock in-stock\">2 in stock<\/p>","sku":"6006239211686","weight":" kg","dimensions":"","min_qty":1,"max_qty":2,"backorders_allowed":false,"is_in_stock":true,"is_downloadable":false,"is_virtual":false,"is_sold_individually":"no","variation_description":""},
{"variation_id":575,"variation_is_visible":true,"variation_is_active":true,"is_purchasable":true,"display_price":949,"display_regular_price":949,"attributes":{"attribute_pa_size":"double"},"image_src":"","image_link":"","image_title":"","image_alt":"","image_caption":"","image_srcset":"","image_sizes":"","price_html":"<span class=\"price\"><span class=\"woocommerce-Price-amount amount\"><span class=\"woocommerce-Price-currencySymbol\">R<\/span>949.00<\/span><\/span>","availability_html":"<p class=\"stock in-stock\">2 in stock<\/p>","sku":"6006239211679","weight":" kg","dimensions":"","min_qty":1,"max_qty":2,"backorders_allowed":false,"is_in_stock":true,"is_downloadable":false,"is_virtual":false,"is_sold_individually":"no","variation_description":""}
]
# keys being a set ensures that every key occurs only once.
keys = set()
# Initializing form_data_returned as a defaultdict allows
# us to access keys that are not already in form_data_returned.
# For example form_data_returned['weight'].append('kg') would throw
# KeyError exception for an empty form_data_returned had we declared
# it as a normal dict().
form_data_returned = defaultdict(list)
for dictionary in raw:
keys.update(dictionary.keys())
for key in keys:
form_data_returned[key].append(dictionary[key])
We can now retrieve data by key:
print(form_data_returned['variation_id'])
>>> [573, 574, 575]
Upvotes: 2