Ke.
Ke.

Reputation: 2586

python create list of nested dicts

I'm using beautifulsoup to get XML data and put it into an array of dicts. However, it doesnt work as expected. The same dict just gets added to the list. How can I make the correct dict get added to the list, at the correct stages of the nested for loop?

The printed listy should look like the following:

[OrderedDict([('name', ‘dogs’), ('type', ‘housed’), ('value', ‘123’)]),
 OrderedDict([('name', ‘cats’), ('type', ‘wild’), ('value', ‘456’)]),
 OrderedDict([('name', ‘mice’), ('type', ‘housed’), ('value', ‘789’)])]

Is it better to put it in a dict instead of a list?

Here is the XML:
<window>
    <window class="Obj" name="ray" type="housed">
        <animal name="dogs",  value = "123" />
        <species name="sdogs",  value = "s123" />
    </window>
    <window class="Obj" name="james" type="wild">
        <animal name="cats", type="wild", value = "456" />
        <species name="scats", type="swild", value = "s456" />
    </window>
    <window class="Obj" name="bob" type="housed">
        <animal name="mice",  value = "789" />
        <species name="smice",  value = "s789" />
    </window>
</window>

And heres the code (sorry if there are a few mistakes, I can correct them as this is an example of a larger code):

import sys
import pprint
from bs4 import BeautifulSoup as bs
from collections import OrderedDict

soup = bs(open("test.xml"),"lxml")
dicty = OrderedDict()
listy = [];
Objs=soup.findAll('window',{"class":"Obj"})

#print Objs
for Obj in Objs:
    Objarr =  OrderedDict()     #### move this down
    #I want to add data to the array here:
    #print Obj
    for child in Obj.children:
        Objarr.update({"namesss" : Obj['name']})
        if child.name is not None:
            if child.name == "species":
                print Obj['name']
                print child['value']
                #Also, adding data to the array here:
                Objarr.update({"name" : Obj['name']})
                Objarr.update({"type" : Obj['type']})
                Objarr.update({"value": child['name']})
    listy.append(Objarr)        #### dedent this

pprint.pprint(listy)

Upvotes: 0

Views: 158

Answers (2)

ettanany
ettanany

Reputation: 19816

Look at the following to understand what your objs contains:

>>> soup = bs(open("my_xml.xml"),"lxml")
>>>
>>> objs = soup.findAll('window',{"class":"Obj"})
>>>
>>> for obj in objs:
...     for child in obj.children:
...         print child
...


<animal name="dogs" type="housed" value="123"></animal>


<animal name="cats" type="wild" value="456"></animal>


<animal name="mice" type="housed" value="789"></animal>


<window>
</window>

Means that the first element in objs is a \n and the last element is <window>\n</window> and between each other elements there a \n that separates each two elements.

To solve this issue, you need to convert you listiterator (obj.children) to a normal list like this list(obj.children) and then use these values for your list slicing: start: 1, end: -2, step: 2, like this list(obj.children)[1:-2:2]

This is the output in this case:

>>> for obj in objs:
...     for child in list(obj.children)[1:-2:2]:
...         print child
...
<animal name="dogs" type="housed" value="123"></animal>
<animal name="cats" type="wild" value="456"></animal>
<animal name="mice" type="housed" value="789"></animal>

Upvotes: 1

stenci
stenci

Reputation: 8491

You are updating a dictionary and appending it to the list. The result is that you keep using the same dictionary again and again. You should create a new dictionary before the beginning of the children loop and add after the loop, not inside.

I guess something like this:

import sys
import pprint
from bs4 import BeautifulSoup as bs
from collections import OrderedDict

soup = bs(open("my.xml"),"lxml")
dicty = OrderedDict()
listy = [];
Objs=soup.findAll('window',{"class":"Obj"})
#print Objs
for Obj in Objs:
    Objarr =  OrderedDict()        #### move this down ####
    #I want to add data to the array here:
    for child in Obj.children:
        if child.name is not None:
            if child.name == "variable":
               #Also, adding data to the array here:
                Objarr.update({"name" : Obj['text']})
                Objarr.update({"type" : " matrix”})
                Objarr.update({"value": child['name']})
    listy.append(Objarr)           #### dedent this ####

pprint.pprint(listy)

Upvotes: 1

Related Questions