Reputation: 3
I am trying to extract data from https://ash.confex.com/ash/2019/webprogram/start.htm and getting an error with find_all
of beautifulsoup
import webbrowser
import os
import requests
from bs4 import BeautifulSoup
import sys
import wget
import pandas as pd
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Chrome('D:\\crome drive\\chromedriver.exe')
driver.get('https://ash.confex.com/ash/2019/webprogram/start.html')
searchterm = driver.find_element_by_id("words").send_keys("CAR-T")
driver.find_element_by_name("submit").click()
#driver.find_element_by_tag_name("resulttitle")
#driver.find_element_by_class_name("a")
soup_level1=BeautifulSoup(driver.page_source, 'lxml')
#fl=soup_level1.find_all(class_='soup_level1')
results = soup_level1.find_all('div', attrs={'class':'resulttitle'})
tag = results.findall('a', attrs='href')
I am getting error
AttributeError: ResultSet object has no attribute 'findall'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
Upvotes: 0
Views: 256
Reputation: 487
Yeah it's exactly as the error says - the find_all
method is supposed to be used on an html tree but in your code the variable results
is a ResultSet object. In bs4 this is a list where each item is an HTML tree.
results = soup_level1.find_all('div', attrs={'class':'resulttitle'})
print(type(results)) # <class 'bs4.element.ResultSet'>
print(results) # []
This also shows that your results
is empty. I searched through the HTML of and didn't see any div with class = "resulttitle" so you may want to double check what you're looking for.
In theory, if your results
variable weren't empty, you could loop through each item in results
and then find all of the links you're looking for:
results = soup_level1.find_all('div', attrs={'class':'resulttitle'})
for result in results:
tag_list = result.find_all('a', attrs='href)
# this will yield another list where each item is an HTML tree
Upvotes: 1