Mo711
Mo711

Reputation: 635

Extract <td> Elements using BS4

I am trying to go through a website and extract some information using Chromedriver. The problem that I have when I use BeautifulSoup is that I can't find a way to extract table inside a class.

The way I am trying to extract the information looks like this:

results = soup.find_all("div", class_="widget widgetLarge fpPerfglissanteclassique")

Is there a way to change this line so that it will only return the Information in <td>...</td> that can be found inside the class?!

Thanks for your answers in advance!

Upvotes: 1

Views: 916

Answers (2)

ptts
ptts

Reputation: 2086

Your results variable contains another BeautifulSoup object (ResultSet) which you can iterate though and call find and find_all on the individual result items.

Like this:

from bs4 import BeautifulSoup

html = """
<div class="widget widgetLarge fpPerfglissanteclassique">
    <td>item 1</td>
    <td>item 2</td>
    <td>item 3</td>
</div>
<div class="widget widgetLarge fpPerfglissanteclassique">
    <td>item 4</td>
    <td>item 5</td>
    <td>item 6</td>
</div>
"""

soup = BeautifulSoup(html, "html.parser")
results = soup.find_all("div", class_="widget widgetLarge fpPerfglissanteclassique")

for result in results:
    table_results = result.find_all("td")
    print(table_results)

Result:

[<td>item 1</td>, <td>item 2</td>, <td>item 3</td>]
[<td>item 4</td>, <td>item 5</td>, <td>item 6</td>]

Upvotes: 3

Andrej Kesely
Andrej Kesely

Reputation: 195438

If the table is inside this class, you can use this example how to get data from it:

from bs4 import BeautifulSoup

html = """
<div class="widget widgetLarge fpPerfglissanteclassique">
    <table>
        <tr>
            <td>1</td><td>2</td><td>3</td>
        </tr>
        <tr>
            <td>4</td><td>5</td><td>6</td>
        </tr>
    </table>
</div>
"""

soup = BeautifulSoup(html, "html.parser")

results = soup.find_all(
    "div", class_="widget widgetLarge fpPerfglissanteclassique"
)

for result in results:  # <-- iterate every result
    for row in result.find_all("tr"):  # <-- find all rows
        cell_data = []
        for cell in row.find_all("td"):  # <-- find all cells inside row
            cell_data.append(cell.text)
        print(*cell_data)

Prints:

1 2 3
4 5 6

Upvotes: 1

Related Questions