Anton
Anton

Reputation: 4815

Beautiful Soup: 'ResultSet' object has no attribute 'find_all'?

I am trying to scrape a simple table using Beautiful Soup. Here is my code:

import requests
from bs4 import BeautifulSoup

url = 'https://gist.githubusercontent.com/anonymous/c8eedd8bf41098a8940b/raw/c7e01a76d753f6e8700b54821e26ee5dde3199ab/gistfile1.txt'
r = requests.get(url)

soup = BeautifulSoup(r.text)
table = soup.find_all(class_='dataframe')

first_name = []
last_name = []
age = []
preTestScore = []
postTestScore = []

for row in table.find_all('tr'):
    col = table.find_all('td')

    column_1 = col[0].string.strip()
    first_name.append(column_1)

    column_2 = col[1].string.strip()
    last_name.append(column_2)

    column_3 = col[2].string.strip()
    age.append(column_3)

    column_4 = col[3].string.strip()
    preTestScore.append(column_4)

    column_5 = col[4].string.strip()
    postTestScore.append(column_5)

columns = {'first_name': first_name, 'last_name': last_name, 'age': age, 'preTestScore': preTestScore, 'postTestScore': postTestScore}
df = pd.DataFrame(columns)
df

However, whenever I run it, I get this error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-116-a900c2872793> in <module>()
     14 postTestScore = []
     15 
---> 16 for row in table.find_all('tr'):
     17     col = table.find_all('td')
     18 

AttributeError: 'ResultSet' object has no attribute 'find_all'

I have read around a dozen StackOverflow questions about this error, and I cannot figure out what I am doing wrong.

Upvotes: 48

Views: 108861

Answers (3)

Ralf Haring
Ralf Haring

Reputation: 1243

The table variable contains a list. You would need to call find_all on its members (even though you know it's a list with only one member), not on the entire thing.

>>> type(table)
<class 'bs4.element.ResultSet'>
>>> type(table[0])
<class 'bs4.element.Tag'>
>>> len(table[0].find_all('tr'))
6
>>>

Upvotes: 57

otus
otus

Reputation: 5732

table = soup.find_all(class_='dataframe')

This gives you a result set – i.e. all the elements that match the class. You can either iterate over them or, if you know you only have one dataFrame, you can use find instead. From your code it seems the latter is what you need, to deal with the immediate problem:

table = soup.find(class_='dataframe')

However, that is not all:

for row in table.find_all('tr'):
    col = table.find_all('td')

You probably want to iterate over the tds in the row here, rather than the whole table. (Otherwise you'll just see the first row over and over.)

for row in table.find_all('tr'):
    for col in row.find_all('td'):

Upvotes: 16

Padraic Cunningham
Padraic Cunningham

Reputation: 180441

Iterate over table and use rowfind_all('td')

   for row in table:
        col = row.find_all('td')

Upvotes: 2

Related Questions