Reputation: 802
My goal is to get each link
My code prints the href/link, however it also prints other junk which i do not want.
I only want the href/
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import time
import requests
driver = webdriver.Chrome()
productlink=[]
for x in range (1,3):
driver.get(f'https://meetinglibrary.asco.org/browse-meetings/2021%20Gastrointestinal%20Cancers%20Symposium?page={x}')
time.sleep(3)
page_source = driver.page_source
soup = BeautifulSoup(page_source,'html.parser')
productlist=soup.find_all('div',class_='session')
for item in productlist:
for link in item.find_all('a',class_='session__button ng-star-inserted',href=True):
print(link)
Upvotes: 0
Views: 56
Reputation: 12672
Because href=True
means get those tags with href
attribute.There are still Tag
. To get the href
, you also need to use .get("href")
.Since there is only one button in each session
tag, you could use find
instead of find_all
,and don't forget to join the baseURL
.Try code below:
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import time
import requests
driver = webdriver.Chrome()
productlink=[]
baseURL = 'https://meetinglibrary.asco.org'
for x in range (1,3):
driver.get(f'https://meetinglibrary.asco.org/browse-meetings/2021%20Gastrointestinal%20Cancers%20Symposium?page={x}')
time.sleep(3)
page_source = driver.page_source
soup = BeautifulSoup(page_source,'html.parser')
productlist=soup.find_all('div',class_='session')
for item in productlist:
print(baseURL + item.find('a',class_='session__button ng-star-inserted',href=True).get("href"))
Print:
https://meetinglibrary.asco.org/session/13455
https://meetinglibrary.asco.org/session/13458
https://meetinglibrary.asco.org/session/13445
https://meetinglibrary.asco.org/session/13450
https://meetinglibrary.asco.org/session/13460
https://meetinglibrary.asco.org/session/13462
https://meetinglibrary.asco.org/session/13464
https://meetinglibrary.asco.org/session/13459
https://meetinglibrary.asco.org/session/13446
https://meetinglibrary.asco.org/session/13451
https://meetinglibrary.asco.org/session/13461
https://meetinglibrary.asco.org/session/13463
https://meetinglibrary.asco.org/session/13465
https://meetinglibrary.asco.org/session/13399
https://meetinglibrary.asco.org/session/13443
https://meetinglibrary.asco.org/session/13444
https://meetinglibrary.asco.org/session/13352
https://meetinglibrary.asco.org/session/13381
https://meetinglibrary.asco.org/session/13383
https://meetinglibrary.asco.org/session/13372
https://meetinglibrary.asco.org/session/13382
https://meetinglibrary.asco.org/session/13447
https://meetinglibrary.asco.org/session/13849
https://meetinglibrary.asco.org/session/13384
https://meetinglibrary.asco.org/session/13389
https://meetinglibrary.asco.org/session/13453
https://meetinglibrary.asco.org/session/13859
https://meetinglibrary.asco.org/session/13391
https://meetinglibrary.asco.org/session/13392
https://meetinglibrary.asco.org/session/13394
....
Upvotes: 1