Stephen Smith
Stephen Smith

Reputation: 352

Type Error: Result Set Is Not Callable - BeautifulSoup

I am having a problem with web-scraping. I am trying to learn how to do it, but I can't seem to get past some of the basics. I am getting an error, "TypeError: 'ResultSet' object is not callable" is the error I'm getting.

I've tried a number of different things. I was originally trying to use the "find" instead of "find_all" function, but I was having an issue with beautifulsoup pulling in a nonetype. I was unable to create an if loop that could overcome that exception, so I tried using the "find_all" instead.

page = requests.get('https://topworkplaces.com/publication/ocregister/')

soup = BeautifulSoup(page.text,'html.parser')all_company_list = 
soup.find_all(class_='sortable-table')
#all_company_list = soup.find(class_='sortable-table')


company_name_list_items = all_company_list('td')

for company_name in company_name_list_items:
    #print(company_name.prettify())
    companies = company_name.content[0]

I'd like this to pull in all the companies in Orange County California that are on this list in a clean manner. As you can see, I've already accomplished pulling them in, but I want the list to be clean.

Upvotes: 1

Views: 1118

Answers (2)

You've got the right idea. I think instead of immediately finding all the <td> tags (which is going to return one <td> for each row (140 rows) and each column in the row (4 columns)), if you want only the company names, it might be easier to find all the rows (<tr> tags) then append however many columns you want by iterating the <td>s in each row. This will get the first column, the company names:

import requests
from bs4 import BeautifulSoup

page = requests.get('https://topworkplaces.com/publication/ocregister/')

soup = BeautifulSoup(page.text,'html.parser')
all_company_list = soup.find_all('tr')

company_list = [c.find('td').text for c in all_company_list[1::]]

Now company_list contains all 140 company names:

 >>> print(len(company_list))

['Advanced Behavioral Health', 'Advanced Management Company & R³ Construction Services, Inc.',
...
, 'Wes-Tec, Inc', 'Western Resources Title Company', 'Wunderman', 'Ytel, Inc.', 'Zillow Group']

Change c.find('td') to c.find_all('td') and iterate that list to get all the columns for each company.

Upvotes: 1

QHarr
QHarr

Reputation: 84465

Pandas:

Pandas is often useful here. The page uses multiple sorts including company size, rank. I show rank sort.

import pandas as pd

table = pd.read_html('https://topworkplaces.com/publication/ocregister/')[0]
table.columns = table.iloc[0]
table = table[1:]
table.Rank = pd.to_numeric(table.Rank)
rank_sort_table = table.sort_values(by='Rank', axis=0, ascending = True)
rank_sort_table.reset_index(inplace=True, drop=True)
rank_sort_table.columns.names = ['Index']
print(rank_sort_table)

Depending on your sort, companies in order:

print(rank_sort_table.Company)

Requests:

Incidentally, you can use nth-of-type to select just first column (company names) and use id, rather than class name, to identify the table as faster

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://topworkplaces.com/publication/ocregister/')
soup = bs(r.content, 'lxml')
names = [item.text for item in soup.select('#twpRegionalList td:nth-of-type(1)')]
print(names)

Note the default sorting is alphabetical on name column rather than rank.


Reference:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html

Upvotes: 1

Related Questions