Reputation: 22440
I've written a tiny script in python using BeautifulSoup to parse some items out of some element stored within content
variable within the below script. I do not wish to append items to the previously created empty dictionary; rather, i wish to parse it on the fly. However, I tried but could not achieve that. Any help would be highly appreciated.
This is my attempt:
from bs4 import BeautifulSoup
content="""
<table class="data">
<tbody>
<tr class="blue">
<td>hot</td>
<td>cold</td>
</tr>
<tr>
<td>day</td>
<td>night</td>
</tr>
</tbody>
</table>
"""
soup = BeautifulSoup(content,'lxml')
for items in soup.select('tr'):
data = [item.text for item in items.select("td")]
dict_val = {data[0] : data[1]}
print(dict_val)
The way I'm getting the output:
{'hot': 'cold'}
{'day': 'night'}
The way I expect to have the output:
{'hot': 'cold','day': 'night'}
Upvotes: 0
Views: 83
Reputation: 84465
Using nth-child with bs4 4.7.1 and a dictionary comprehension. Solution specific to example as shown.
from bs4 import BeautifulSoup
content="""
<table class="data">
<tbody>
<tr class="blue">
<td>hot</td>
<td>cold</td>
</tr>
<tr>
<td>day</td>
<td>night</td>
</tr>
</tbody>
</table>
"""
soup = BeautifulSoup(content,'lxml')
result = {k.text:v.text for (k,v) in zip(soup.select('.data tr:nth-child(odd) td'), soup.select('.data tr:nth-child(even) td'))}
Upvotes: 0
Reputation: 71451
You can create a dictionary outside the for-loop:
soup = BeautifulSoup(content,'lxml')
d = {}
for items in soup.select('tr'):
data = [item.text for item in items.select("td")]
d[data[0]] = data[1]
print(d)
Or, you can create a dictionary in one line:
from bs4 import BeautifulSoup as soup
s = [i.text for i in soup(content, 'lxml').findAll('td')]
new_s = dict([s[i:i+2] for i in range(0, len(s), 2)])
Output:
{u'hot': u'cold', u'day': u'night'}
Upvotes: 2
Reputation: 329
You need to initialize the dictionary before the for loop, in your code in every iteration a new dictionary is being created in this line dict_val = {data[0] : data[1]}
. You can try the following code:
from bs4 import BeautifulSoup
content="""
<table class="data">
<tbody>
<tr class="blue">
<td>hot</td>
<td>cold</td>
</tr>
<tr>
<td>day</td>
<td>night</td>
</tr>
</tbody>
</table>
"""
soup = BeautifulSoup(content,'lxml')
dict_val = {}
for items in soup.select('tr'):
data = [item.text for item in items.select("td")]
dict_val[data[0]] = data[1]
print(dict_val)
Upvotes: 1
Reputation: 83527
Remember that a computer will only do exactly what you tell it to. Your original code has this line:
dict_val = {data[0] : data[1]}
This creates a new dictionary every time the loop iterates. If instead, you want to create a single dictionary and add elements to it, you need to do just that. Often it helps to write out the steps in words:
create a dictionary
for each row in the table:
parse the <td> elements from the row
add an entry to the dictionary
Most of this you have already translated into Python. The key differences are where the dictionary is created and how the data from the HTML is inserted into the dictionary. I will leave the details of how to do this in Python as an exercise. (Hint: look at the other answer.) The important thing here is thinking clearly about the order of the steps you want to perform and then figuring out how to do it in Python.
Upvotes: 1