naman shah
naman shah

Reputation: 11

how to extract the text from the following HTML code?

I am doing web scraping for a DS project, and i am using BeautifulSoup for that. But i am unable to extract the Duration from "tbody" tag in "table" class. Following is the HTML code :

<div class="table-responsive">
    <table class="table">
        <thead>
            <tr>
                <th>Start Date</th>
                <th>Duration</th>
                <th>Stipend</th>
                <th>Posted On</th>
                <th>Apply By</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td>
                    <div id="start-date-first">Immediately</div>
                </td>
                <td>1 Month</td>
                <td class="stipend_container_table_cell"> <i class="fa fa-inr"></i>
                1500 /month
                </td>
                <td>26 May'20</td>
                <td>23 Jun'20</td>
            </tr>
        </tbody>
    </table>
</div>

Note : for extracting 'Immediately' text, i use the following code :

x = container.find("div", {"class" : "table-responsive"})
x.table.tbody.tr.td.div.text

Upvotes: 0

Views: 57

Answers (2)

Andre Nevares
Andre Nevares

Reputation: 741

Try this:

from bs4 import BeautifulSoup
import requests

url = "yourUrlHere"

pageRaw = requests.get(url).text
soup = BeautifulSoup(pageRaw , 'lxml')
print(soup.table)

In my code i use lxml library to parse the data. If you want to install pip install lxml... or just change into your libray in this part of the code:

soup = BeautifulSoup(pageRaw , 'lxml')

This code will return the first table ok?

Take care

Upvotes: 0

studio-luke
studio-luke

Reputation: 128

You can use select() function to find tags by css selector.

tds = container.select('div > table > tbody > tr > td')
# or just select('td'), since there's no other td tag

print(tds[1].text)

The return value of select() function is the list of all HTML tags that matches the selector. The one you want to retrieve is second one, so using index 1, then get text of it.

Upvotes: 3

Related Questions