Reputation: 635
I am trying to go through a website and extract some information using Chromedriver. The problem that I have when I use BeautifulSoup is that I can't find a way to extract table inside a class
.
The way I am trying to extract the information looks like this:
results = soup.find_all("div", class_="widget widgetLarge fpPerfglissanteclassique")
Is there a way to change this line so that it will only return the Information in <td>...</td>
that can be found inside the class?!
Thanks for your answers in advance!
Upvotes: 1
Views: 916
Reputation: 2086
Your results
variable contains another BeautifulSoup object (ResultSet) which you can iterate though and call find
and find_all
on the individual result items.
Like this:
from bs4 import BeautifulSoup
html = """
<div class="widget widgetLarge fpPerfglissanteclassique">
<td>item 1</td>
<td>item 2</td>
<td>item 3</td>
</div>
<div class="widget widgetLarge fpPerfglissanteclassique">
<td>item 4</td>
<td>item 5</td>
<td>item 6</td>
</div>
"""
soup = BeautifulSoup(html, "html.parser")
results = soup.find_all("div", class_="widget widgetLarge fpPerfglissanteclassique")
for result in results:
table_results = result.find_all("td")
print(table_results)
Result:
[<td>item 1</td>, <td>item 2</td>, <td>item 3</td>]
[<td>item 4</td>, <td>item 5</td>, <td>item 6</td>]
Upvotes: 3
Reputation: 195438
If the table is inside this class, you can use this example how to get data from it:
from bs4 import BeautifulSoup
html = """
<div class="widget widgetLarge fpPerfglissanteclassique">
<table>
<tr>
<td>1</td><td>2</td><td>3</td>
</tr>
<tr>
<td>4</td><td>5</td><td>6</td>
</tr>
</table>
</div>
"""
soup = BeautifulSoup(html, "html.parser")
results = soup.find_all(
"div", class_="widget widgetLarge fpPerfglissanteclassique"
)
for result in results: # <-- iterate every result
for row in result.find_all("tr"): # <-- find all rows
cell_data = []
for cell in row.find_all("td"): # <-- find all cells inside row
cell_data.append(cell.text)
print(*cell_data)
Prints:
1 2 3
4 5 6
Upvotes: 1