Reputation: 802
My code gets links/HTML from different "sections" of a page.
It prints 2 links per section, however I only want the first one printed.
Expected output should not contain the links ending with "video", as it does with my code.
from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Chrome()
jam=[]
baseurl='https://meetinglibrary.asco.org'
driver.get('https://meetinglibrary.asco.org/results?meetingView=2020%20ASCO%20Virtual%20Scientific%20Program&page=1')
time.sleep(3)
page_source = driver.page_source
soup = BeautifulSoup(page_source,'html.parser')
productlist=soup.find_all('a',class_='ng-star-inserted')
for item in productlist:
for link in item.find_all('a',href=True):
jam.append(baseurl+link['href'])
print(jam)
Upvotes: 0
Views: 47
Reputation: 66
You can use the condition function before appending the script.
...
for item in productlist:
ahrefs = item.find_all('a', href=True)
for index in range(len(ahrefs)):
if (index % 2 == 0) and ('video' not in ahrefs[index]['href']):
jam.append(baseurl+ahrefs[index]['href'])
print(jam)
...
Let me know after trying. Good luck
Upvotes: 1
Reputation: 12672
Use os.path.basename
to get the end of string.And use in
operator to check whether "video"
exists:
from selenium import webdriver
from bs4 import BeautifulSoup
import time
import os
driver = webdriver.Chrome()
jam = []
baseurl = 'https://meetinglibrary.asco.org'
driver.get('https://meetinglibrary.asco.org/results?meetingView=2020%20ASCO%20Virtual%20Scientific%20Program&page=1')
time.sleep(3)
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'html.parser')
productlist = soup.find_all('a', class_='ng-star-inserted')
for item in productlist:
for link in item.find_all('a', href=True):
url = link['href']
if "video" not in os.path.basename(url):
jam.append(baseurl + url)
print(jam)
result:
['https://meetinglibrary.asco.org/record/185955/abstract',
'https://meetinglibrary.asco.org/record/185955/slide',
'https://meetinglibrary.asco.org/record/185954/abstract',
'https://meetinglibrary.asco.org/record/186048/abstract',
'https://meetinglibrary.asco.org/record/186048/slide',
'https://meetinglibrary.asco.org/record/190197/slide',
'https://meetinglibrary.asco.org/record/192623/slide',
'https://meetinglibrary.asco.org/record/185414/abstract',
'https://meetinglibrary.asco.org/record/185414/slide',
'https://meetinglibrary.asco.org/record/185415/abstract',
'https://meetinglibrary.asco.org/record/185415/slide',
'https://meetinglibrary.asco.org/record/185473/abstract',
'https://meetinglibrary.asco.org/record/185473/slide',
'https://meetinglibrary.asco.org/record/187584/slide',
'https://meetinglibrary.asco.org/record/188561/slide',
'https://meetinglibrary.asco.org/record/186710/abstract',
'https://meetinglibrary.asco.org/record/186710/slide',
'https://meetinglibrary.asco.org/record/186699/abstract',
'https://meetinglibrary.asco.org/record/186699/slide',
'https://meetinglibrary.asco.org/record/186698/abstract',
'https://meetinglibrary.asco.org/record/186698/slide',
'https://meetinglibrary.asco.org/record/187720/slide',
'https://meetinglibrary.asco.org/record/187480/abstract',
'https://meetinglibrary.asco.org/record/187480/slide',
'https://meetinglibrary.asco.org/record/191961/slide',
'https://meetinglibrary.asco.org/record/192626/slide',
'https://meetinglibrary.asco.org/record/186983/abstract',
'https://meetinglibrary.asco.org/record/186983/slide',
'https://meetinglibrary.asco.org/record/188580/abstract',
'https://meetinglibrary.asco.org/record/188580/slide',
'https://meetinglibrary.asco.org/record/189047/abstract',
'https://meetinglibrary.asco.org/record/189047/slide',
'https://meetinglibrary.asco.org/record/190223/slide',
'https://meetinglibrary.asco.org/record/190273/slide',
'https://meetinglibrary.asco.org/record/184812/abstract',
'https://meetinglibrary.asco.org/record/184812/slide',
'https://meetinglibrary.asco.org/record/184927/slide',
'https://meetinglibrary.asco.org/record/184805/abstract',
'https://meetinglibrary.asco.org/record/184805/slide',
'https://meetinglibrary.asco.org/record/184811/abstract',
'https://meetinglibrary.asco.org/record/184811/slide',
'https://meetinglibrary.asco.org/record/185576/slide',
'https://meetinglibrary.asco.org/record/190147/slide']
Upvotes: 1