Reputation: 466
I'm new to BeautifulSoup and I've been struggling with data parsing from a table:
<table id="data">
<tr>
<td class="random.data"></td>
<td class="name"></td>
<td class="values"></td> <!-- 0 -->
<td class="values"></td> <!-- 1 -->
<td class="values"></td> <!-- 2 -->
<td class="values"></td> <!-- 3 -->
</tr>
<tr>
<td class=".random_data"></td>
<td class="name"></td>
<td class="values"></td> <!-- 0 -->
<td class="values"></td> <!-- 1 -->
<td class="values"></td> <!-- 2 -->
<td class="values"></td> <!-- 3 -->
</tr>
</table>
I want to create a list of dictionaries like this pseudocode:
content = []
for tr in trs:
info = {
'name': tr.getChildren('.name').getText(),
'value1': tr.getChildren('.values', 0).getText() # the first value from values
'value3': tr.getChildren('.values', 3).getText() # the fourth value from values
}
content.append(info)
But I've been trying around and failing miserably to translate this into BeautifulSoup, any help or hint?
Upvotes: 1
Views: 99
Reputation: 473853
The idea is to iterate over table rows and, for ever row, find the name
by the class name, all the values by the values
class name and get the desired values by index:
from bs4 import BeautifulSoup
data = """
<table id="data">
<tr>
<td class="random.data"></td>
<td class="name">test1</td>
<td class="values">0</td>
<td class="values">1</td>
<td class="values">2</td>
<td class="values">3</td>
</tr>
<tr>
<td class=".random_data"></td>
<td class="name">test2</td>
<td class="values">0</td>
<td class="values">1</td>
<td class="values">2</td>
<td class="values">3</td>
</tr>
</table>
"""
soup = BeautifulSoup(data)
data = []
for row in soup.select("table#data tr"):
name = row.find("td", class_="name").get_text(strip=True)
values = row.find_all("td", class_="values")
data.append({
"name": name,
"value1": values[0].get_text(strip=True),
"value3": values[3].get_text(strip=True)
})
print data
Prints:
[
{'value3': u'3', 'name': u'test1', 'value1': u'0'},
{'value3': u'3', 'name': u'test2', 'value1': u'0'}
]
Upvotes: 1