NRVA
NRVA

Reputation: 507

Converting Beautifulsoup scraped table to list

Scraping a column from Wikipedia with Beautifulsoup returns the last row, while I want all of them in a list:

from urllib.request import urlopen
from bs4 import BeautifulSoup
​
site = "https://en.wikipedia.org/wiki/Agriculture_in_India"
html = urlopen(site)
soup = BeautifulSoup(html, "html.parser")
table = soup.find("table", {'class': 'wikitable sortable'})
​
for row in table.find_all("tr")[1:]:
    col = row.find_all("td")
    if len(col) > 0:
            com = str(col[1].string.strip("\n"))
​
        list(com)
com

Out: 'ZTS'

So it only shows the last row of the string, I was expecting to get a list with each line of the string as a string value. So that I can assign the list to new variable.

"Rice", "Buffalo milk", "Cow milk", "Wheat"

Can anyone help me?

Upvotes: 1

Views: 1359

Answers (1)

humble
humble

Reputation: 2168

Your method will not work because you are not "adding" anything to com.

One way to do what you desire is:

from urllib.request import urlopen
from bs4 import BeautifulSoup
site = "https://en.wikipedia.org/wiki/Agriculture_in_India"
html = urlopen(site)
soup = BeautifulSoup(html, "html.parser")
table = soup.find("table", {'class': 'wikitable sortable'})
com=[]
for row in table.find_all("tr")[1:]:
    col = row.find_all("td")
    if len(col)> 0:
        temp=col[1].contents[0]
        try:
            to_append=temp.contents[0]
        except Exception as e:
            to_append=temp
        com.append(to_append)

print(com)

This will give you what you require.

Explanation

col[1].contents[0] gives the first child of the tag. .contents gives you a list of children of the tag. Here we have a single child so 0.

In some cases, the content inside the <tr> tag is a <a href> tag. So I apply another .contents[0] to get the text.

In other cases it is not a link. For that I used an exception statement. If there is no descendant of the child extracted, we would get an exception.

See the official documentation for details

Upvotes: 2

Related Questions