Asmita
Asmita

Reputation: 13

Scraping a Table using BeautifulSoup in Jupyter Notebook

I'm trying to print a table of baby's names given in list format using Beautifulsoup.

google-python-exercises/google-python-exercises/babynames/baby1990.html (HTML page is a screenshot of actual URL)

After fetching the table using urllib.request and parsing it with BeautifulSoup, I was able to print the data inside every row of the table but I'm getting the wrong output.

here is my code:

right_table = soup.find('table',attrs = {"summary" : "Popularity for top 1000"})
table_rows = right_table.find_all('tr') 

for tr in table_rows:
td = tr.find_all('td')
row = [i.text for i in td]
print(row)

It is supposed to print 1 list containing all the data in the rows, However, I get a number of lists with every new list starting with one less record in it

Kind of like this:

['997', 'Eliezer', 'Asha', '998', 'Jory', 'Jada', '999', 'Misael', 'Leila', '1000', 'Tate', 'Peggy', 'Note: Rank 1 is the most popular,\nrank 2 is the next most popular, and so forth. \n']
['998', 'Jory', 'Jada', '999', 'Misael', 'Leila', '1000', 'Tate', 'Peggy', 'Note: Rank 1 is the most popular,\nrank 2 is the next most popular, and so forth. \n']
['999', 'Misael', 'Leila', '1000', 'Tate', 'Peggy', 'Note: Rank 1 is the most popular,\nrank 2 is the next most popular, and so forth. \n']
['1000', 'Tate', 'Peggy', 'Note: Rank 1 is the most popular,\nrank 2 is the next most popular, and so forth. \n']
['Note: Rank 1 is the most popular,\nrank 2 is the next most popular, and so forth. \n']

How to print only one list?

Upvotes: 1

Views: 1391

Answers (2)

chitown88
chitown88

Reputation: 28640

Your loop is creating your row list, then prints it, and then it goes into the next iteration, where it creates a row list (overwriting your previous), then printing it, etc etc etc.

Not sure why you’d want all the rows into one list, but To have one final list, you’ll need to append each row list into a final list at each iteration.

Do you actually mean you want a list of your rows list?

right_table = soup.find('table',attrs = {"summary" : "Popularity for top 1000"})
table_rows = right_table.find_all('tr') 


result_list = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [i.text for i in td]
    result_list = result_list + row


print(result_list)

If you really meant a list of your rows, then use this one:

right_table = soup.find('table',attrs = {"summary" : "Popularity for top 1000"})
table_rows = right_table.find_all('tr') 


result_list = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [i.text for i in td]
    result_list.append(row)


print(result_list)

But honestly, I’d use pandas and .read_html() as QHarr suggests.

right_table = soup.find('table',attrs = {"summary" : "Popularity for top 1000"})
table_rows = right_table.find_all('tr') 


result_list = []
for tr in table_rows:
    td = tr.find_all('td')
    for data in td:
        print (td.text)

Upvotes: 0

QHarr
QHarr

Reputation: 84475

I would try and use pandas and index into results list of tables to get the table you want

import pandas as pd

tables = pd.read_html('yourURL')

print(tables[1]) # for example; change index as required

Upvotes: 2

Related Questions