Zach
Zach

Reputation: 441

Beautiful Soup:Scrape Table Data

I'm looking to extract table data from the url below. Specifically I would like to extract the data in first column. When I run the code below, the data in the first column repeats multiple times. How can I get the values to show only once as it appears in the table?

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('http://www.pythonscraping.com/pages/page3.html').read()
soup = BeautifulSoup(html, 'lxml')
table = soup.find('table',{'id':'giftList'})

rows = table.find_all('tr')

for row in rows:
    data = row.find_all('td')
    for cell in data:
        print(data[0].text)

Upvotes: 1

Views: 2556

Answers (2)

SIM
SIM

Reputation: 22440

Using requests module in combination with selectors you can try like the following as well:

import requests
from bs4 import BeautifulSoup

link = 'http://www.pythonscraping.com/pages/page3.html'

soup = BeautifulSoup(requests.get(link).text, 'lxml')
for table in soup.select('table#giftList tr')[1:]:
    cell = table.select_one('td').get_text(strip=True)
    print(cell)

Output:

Vegetable Basket
Russian Nesting Dolls
Fish Painting
Dead Parrot
Mystery Box

Upvotes: 2

Ozzy Walsh
Ozzy Walsh

Reputation: 887

Try this:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('http://www.pythonscraping.com/pages/page3.html').read()
soup = BeautifulSoup(html, 'lxml')
table = soup.find('table',{'id':'giftList'})

rows = table.find_all('tr')

for row in rows:
    data = row.find_all('td')

    if (len(data) > 0):
        cell = data[0]
        print(cell.text)

Upvotes: 2

Related Questions