Big_Data_engineer
Big_Data_engineer

Reputation: 3

Extract data in Python using beautifulsoup

I am trying to extract data from https://ash.confex.com/ash/2019/webprogram/start.htm and getting an error with find_all of beautifulsoup

import webbrowser
import os
import requests
from bs4 import BeautifulSoup
import sys
import wget
import pandas as pd
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome('D:\\crome drive\\chromedriver.exe')
driver.get('https://ash.confex.com/ash/2019/webprogram/start.html')
searchterm = driver.find_element_by_id("words").send_keys("CAR-T")
driver.find_element_by_name("submit").click()
#driver.find_element_by_tag_name("resulttitle")
#driver.find_element_by_class_name("a")

soup_level1=BeautifulSoup(driver.page_source, 'lxml')
#fl=soup_level1.find_all(class_='soup_level1')
results = soup_level1.find_all('div', attrs={'class':'resulttitle'})
tag = results.findall('a', attrs='href')

I am getting error

AttributeError: ResultSet object has no attribute 'findall'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

Upvotes: 0

Views: 256

Answers (1)

Joseph Rajchwald
Joseph Rajchwald

Reputation: 487

Yeah it's exactly as the error says - the find_all method is supposed to be used on an html tree but in your code the variable results is a ResultSet object. In bs4 this is a list where each item is an HTML tree.

results = soup_level1.find_all('div', attrs={'class':'resulttitle'})
print(type(results))   # <class 'bs4.element.ResultSet'>
print(results)   # []

This also shows that your results is empty. I searched through the HTML of and didn't see any div with class = "resulttitle" so you may want to double check what you're looking for.

In theory, if your results variable weren't empty, you could loop through each item in results and then find all of the links you're looking for:

results = soup_level1.find_all('div', attrs={'class':'resulttitle'})
for result in results:
    tag_list = result.find_all('a', attrs='href)     
    # this will yield another list where each item is an HTML tree

Upvotes: 1

Related Questions