SIM
SIM

Reputation: 22440

Unable to store parsed items as dictionary

I've written a tiny script in python using BeautifulSoup to parse some items out of some element stored within content variable within the below script. I do not wish to append items to the previously created empty dictionary; rather, i wish to parse it on the fly. However, I tried but could not achieve that. Any help would be highly appreciated.

This is my attempt:

from bs4 import BeautifulSoup

content="""
<table class="data">
    <tbody>
        <tr class="blue">
            <td>hot</td>
            <td>cold</td>
        </tr>
        <tr>
            <td>day</td>
            <td>night</td>
        </tr>
    </tbody>
</table>
"""
soup = BeautifulSoup(content,'lxml')
for items in soup.select('tr'):
    data = [item.text for item in items.select("td")]
    dict_val = {data[0] : data[1]}
    print(dict_val)

The way I'm getting the output:

{'hot': 'cold'}
{'day': 'night'}

The way I expect to have the output:

{'hot': 'cold','day': 'night'}

Upvotes: 0

Views: 83

Answers (4)

QHarr
QHarr

Reputation: 84465

Using nth-child with bs4 4.7.1 and a dictionary comprehension. Solution specific to example as shown.

from bs4 import BeautifulSoup

content="""
<table class="data">
    <tbody>
        <tr class="blue">
            <td>hot</td>
            <td>cold</td>
        </tr>
        <tr>
            <td>day</td>
            <td>night</td>
        </tr>
    </tbody>
</table>
"""
soup = BeautifulSoup(content,'lxml')
result = {k.text:v.text for (k,v) in zip(soup.select('.data  tr:nth-child(odd) td'), soup.select('.data  tr:nth-child(even) td'))}

Upvotes: 0

Ajax1234
Ajax1234

Reputation: 71451

You can create a dictionary outside the for-loop:

soup = BeautifulSoup(content,'lxml')
d = {}
for items in soup.select('tr'):
   data = [item.text for item in items.select("td")]
   d[data[0]] = data[1]
print(d)

Or, you can create a dictionary in one line:

from bs4 import BeautifulSoup as soup
s = [i.text for i in soup(content, 'lxml').findAll('td')]
new_s = dict([s[i:i+2] for i in range(0, len(s), 2)])

Output:

{u'hot': u'cold', u'day': u'night'}

Upvotes: 2

akimul
akimul

Reputation: 329

You need to initialize the dictionary before the for loop, in your code in every iteration a new dictionary is being created in this line dict_val = {data[0] : data[1]}. You can try the following code:

from bs4 import BeautifulSoup

content="""
<table class="data">
    <tbody>
        <tr class="blue">
            <td>hot</td>
            <td>cold</td>
        </tr>
        <tr>
            <td>day</td>
            <td>night</td>
        </tr>
    </tbody>
</table>
"""
soup = BeautifulSoup(content,'lxml')
dict_val = {}
for items in soup.select('tr'):
    data = [item.text for item in items.select("td")]
    dict_val[data[0]] = data[1]
print(dict_val)

Upvotes: 1

Code-Apprentice
Code-Apprentice

Reputation: 83527

Remember that a computer will only do exactly what you tell it to. Your original code has this line:

dict_val = {data[0] : data[1]}

This creates a new dictionary every time the loop iterates. If instead, you want to create a single dictionary and add elements to it, you need to do just that. Often it helps to write out the steps in words:

create a dictionary
for each row in the table:
    parse the <td> elements from the row
    add an entry to the dictionary

Most of this you have already translated into Python. The key differences are where the dictionary is created and how the data from the HTML is inserted into the dictionary. I will leave the details of how to do this in Python as an exercise. (Hint: look at the other answer.) The important thing here is thinking clearly about the order of the steps you want to perform and then figuring out how to do it in Python.

Upvotes: 1

Related Questions