B.M. Corwen
B.M. Corwen

Reputation: 217

Finding an element through id, class, xpath, css selector returns none when webscraping with selenium and beautifulsoup

I am new to webscraping, and using beautifulsoup and selenium. I am trying to scrape data from the following webpage:

    https://epl.bibliocommons.com/item/show/2300646980

I am trying to scrape the section: "Staff Lists that Include that Title". In particular, I wanted to grab the number of <li> tags, as I only need the number of items/links on that staff list.

I have tried the following on the HTML code provided by "Inspect"-ing the page. The following is the block of HTML code I am trying to scrape from:

<div class="ugc_bandage">
  <div class="lists_heading clearfix">
    <h3 data-test-id="ugc-lists-heading">
      Listed
    </h3>
    <div class="ugc_add_link">
      <div class="dropdown saveToButton clearfix" id="save_to_2300646980_id_7a3ateh0panp1uv0he1v7aqmj9" data-test-id="add-to-list-dropdown-container">
  <a href="#" aria-expanded="false" aria-haspopup="true" class=" dropdown-toggle dropdown-toggle hide_trigger_icon" data-test-id="add-to-list-save-button" data-toggle="dropdown" id="save_button_2300646980_id_7a3ateh0panp1uv0he1v7aqmj9" rel="nofollow">
       <i aria-hidden="true" class=" icon-plus"></i>
<span aria-hidden="true">Add</span><span class="sr-only" data-js="sr-only-dropdown-toggle" data-text-collapsed="Add, collapsed" data-text-expanded="Add, expanded">Add, collapsed</span><span aria-hidden="true" class="icon-arrow"></span></a>  
  <ul class="dropdown-menu">
      <li>
        <a href="/user_lists/new?bib=2300646980&amp;origin=https%3A%2F%2Fepl.bibliocommons.com%2Fitem%2Fload_ugc_content%2F2300646980" class="newList">Create a New List</a>
      </li>
      <li>
        <a href="/lists/add_bib/mine?bib=2300646980_fangirl" data-js="cp-overlay" id="more_lists_id_7a3ateh0panp1uv0he1v7aqmj9">Existing Lists »</a>
      </li>

  </ul>
</div>

    </div>
  </div>
  <h4 data-test-id="staff-lists-that-include-this-title">Staff Lists that include this Title</h4>
  <div data-analytics="{ &quot;SubFeature&quot;: &quot;Lists that include this title&quot; }" class="expand clearfix" id="all_lists_expand" testid="text_listsincluding">
    <ul class="further_list">
      <li> [LIST ENTRIES START HERE, BUT THERE'S SO MANY, IT WOULD MAKE THIS POST TO LONG.] </li>

  1. I have scraped the above code using the xpath, copied from inspecting the staff list section (id="all_lists_expand"):
    element = driver.find_elements_by_xpath('//*[@id="rightBar"]/div[3]/div')
  1. I tried scraping the section by scraping using the class name:
    element = driver.find_element_by_class_name('expand clearfix')
  1. I also tried scraping using the css selector:
    element = driver.find_element_by_css_selector('#all_lists_expand')

I have also done other variants of the code above, looking for classes of the element's parents, xpaths, etc.

All of the above attempts return NONE. I am not sure what I am doing wrong, am I supposed to trigger an event or something using selenium? I am not even clicking on any of the links listed in the list, or even keeping a list of the links, I just need to count how many links there are to begin with.

Upvotes: 2

Views: 2264

Answers (3)

QHarr
QHarr

Reputation: 84465

You don't need the expense of selenium. You can make the same GET request the page does foe that content then extract the html from the json returned and parse with bs4 and extract links

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://epl.bibliocommons.com/item/load_ugc_content/2300646980').json()
soup = bs(r['html'], 'lxml')
links = [i['href'] for i in soup.select('[data-test-id="staff-lists-that-include-this-title"] + div [href]')]
print(len(links))
print(links)

Upvotes: 2

KunduK
KunduK

Reputation: 33384

To get all the anchor tag under Staff Lists that Include that Title induce WebDriverWait and presence_of_all_elements_located() This will give you 100 links.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver=webdriver.Chrome()
driver.get("https://epl.bibliocommons.com/item/show/2300646980")
elements=WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.XPATH,'//h4[@data-test-id="staff-lists-that-include-this-title"]/following::div[1]//li/a')))
print(len(elements))
for ele in elements:
    print(ele.get_attribute('href'))

Output:

https://epl.bibliocommons.com/list/share/114110843_schoolcorps1/1495892159_native_american,_rl_k-3,_spanish_middle_amp_high_school_multcolib_assignments
https://epl.bibliocommons.com/list/share/1467158627_stpl_crystal/1491354799_am_i_seeing_double
https://epl.bibliocommons.com/list/share/568630227_vpl_childrens_teens_info/1490175639_books_just_for_you_-_thought_provoking_amp_charming_ya_reads
https://epl.bibliocommons.com/list/share/1176606007_overdue_finds/1485773789_overdue_finds_episode_39_guilty_pleasures
https://epl.bibliocommons.com/list/share/1312082177_aloha_youthservices/1468001367_its_okay_to_not_be_okay_for_teens
https://epl.bibliocommons.com/list/share/631739687_eplpersonalpicks2/1484211504_epl_personal_picks_ya_novels
https://epl.bibliocommons.com/list/share/186066773_jclemmaf/837858917_favorite_and_my_best
https://epl.bibliocommons.com/list/share/569286917_oplteenbooklists/1476340687_teen_lit_chat_booklist_august_2019
https://epl.bibliocommons.com/list/share/569286917_oplteenbooklists/1459365327_astrology_teen_booklist_books_you_might_like_if_youre_a_virgo
https://epl.bibliocommons.com/list/share/1058529507_pplteen/1258199057_best_back_to_school_reads
https://epl.bibliocommons.com/list/share/1216909347_anna_libraryt/1478214359_ya_novels_about_school
https://epl.bibliocommons.com/list/share/106274081_wplstaffpicks/1477722487_wpl_summer_reads_2019
https://epl.bibliocommons.com/list/share/173100305_jclangelicar/1226682237_amazing_reads_for_teens_and_up
https://epl.bibliocommons.com/list/share/73092242_pickeringteens/1117926097_tag_recommends_continued
https://epl.bibliocommons.com/list/share/73092242_pickeringteens/744582537_tag_recommends_2018
https://epl.bibliocommons.com/list/share/73092242_pickeringteens/1184991797_lets_talk_mental_health
https://epl.bibliocommons.com/list/share/73092242_pickeringteens/822272858_ppl_teens_love,_loss,_and_all_the_feels
https://epl.bibliocommons.com/list/share/73092242_pickeringteens/692256398_aampe_picks
https://epl.bibliocommons.com/list/share/73977058_jclbeckyc/1385964387_the_best_books_of_2019
https://epl.bibliocommons.com/list/share/1059338207_readingadviser_sally/1439607877_books_for_20_somethings-fvrl-2019
https://epl.bibliocommons.com/list/share/279600817_lpl_readersservices/1457670767_2019_squad_goals_read_a_book_set_on_a_college_or_university_campus
https://epl.bibliocommons.com/list/share/631739687_eplpersonalpicks2/1458857587_epl_personal_picks_just_a_little_bit_of_love
https://epl.bibliocommons.com/list/share/1275085237_beaverton_teens/1291469057_female_pov
https://epl.bibliocommons.com/list/share/104627853_princetonpl/1128194327_susans_picks
https://epl.bibliocommons.com/list/share/69155564_kantoniw/376769097_teen_-_terrific_titles
https://epl.bibliocommons.com/list/share/1275085237_beaverton_teens/1292121977_realistic_fiction
https://epl.bibliocommons.com/list/share/1300215227_beaverton_iand/1303358407_books_where_the_parents_are_cool
https://epl.bibliocommons.com/list/share/215214545_multcolib_dianaa/1450141617_casting_a_wide_net_for_tammy_from_multcolib_my_librarian_diana
https://epl.bibliocommons.com/list/share/681590123_scl_kaylin/1030053197_kaylins_picks
https://epl.bibliocommons.com/list/share/173530091_jclhebaha/1171128547_hebahs_staff_picks
https://epl.bibliocommons.com/list/share/1275085237_beaverton_teens/1288931697_recommended_reads_11-12
https://epl.bibliocommons.com/list/share/275252227_martinregionalreads/1369306597_diversity_teenya_books
https://epl.bibliocommons.com/list/share/72152117_steacy_library/1204064657_classic_teen_reads
https://epl.bibliocommons.com/list/share/700233957_snoislelib_suggests/1436626997_harry_potter_y_la_piedra_filosofal
https://epl.bibliocommons.com/list/share/235700377_pomolibrary/1436872057_pomo_picks_-_teen_-_tsrc_2019_-_book_that_is_not_in_a_series_-_grades_9,_10,_11,_12
https://epl.bibliocommons.com/list/share/694280209_kimberlyreads/752020447_level_up_your_reading_-_books_for_gamers_(teen_edition)
https://epl.bibliocommons.com/list/share/1216909347_anna_libraryt/1220688167_ya_reads_for_reluctant_readers
https://epl.bibliocommons.com/list/share/569286917_oplteenbooklists/1405453637_teen_book_chat_april_2019
https://epl.bibliocommons.com/list/share/223261407_burien_teens_read/1424507527_srp_book_talk_glendale_lutheran_8th_grade
https://epl.bibliocommons.com/list/share/1216909347_anna_libraryt/1412382807_top_10_ya_coming-of-age_reads
https://epl.bibliocommons.com/list/share/80402800_vpl_booksjustforyou11/1413011449_vpl_-_books_just_for_you_-_biography,_humour,_inspiration,_short_stories,_and_animal_fiction
https://epl.bibliocommons.com/list/share/760546357_scteenprogramming/1411563307_cmlibrary_suggests_imagicon_2019
https://epl.bibliocommons.com/list/share/1078894377_lisadempster/1411364207_celebrate_your_inner_geek
https://epl.bibliocommons.com/list/share/682768697_arapahoekati/1055224107_published_nanowrimo_authors
https://epl.bibliocommons.com/list/share/1382187347_mollywally/1404738807_mental_health
https://epl.bibliocommons.com/list/share/568630227_vpl_childrens_teens_info/1395459037_books_just_for_you_-_ya_contemporary_amp_mystery
https://epl.bibliocommons.com/list/share/550038607_spl_brittany/1322718057_one_word_titles
https://epl.bibliocommons.com/list/share/1170754297_sppl_recommends/1383661857_no,_you_cant_read_these_books
https://epl.bibliocommons.com/list/share/639095537_sausalito_staff_erin/1377322417_ya_realistic_fiction_for_middle_schoolers
https://epl.bibliocommons.com/list/share/1060442917_readingadviser_jacque/1364177797_teen_favorites
https://epl.bibliocommons.com/list/share/69193241_pepl_knoeske/269126130_ya_reads
https://epl.bibliocommons.com/list/share/155181971_surreylibraries_teens/385766437_hilarity_ensues
https://epl.bibliocommons.com/list/share/1136103357_hfxpl_teens/1374745777_hey_what_are_you_reading
https://epl.bibliocommons.com/list/share/155181971_surreylibraries_teens/1349496509_valentines_day_2019_young_adult_fiction
https://epl.bibliocommons.com/list/share/138070021_surreylibraries_reads/1304148677_staff_picks_what_we_loved_in_2014
https://epl.bibliocommons.com/list/share/80402800_vpl_booksjustforyou11/1365444807_vpl_-_new_adult_-_top_picks
https://epl.bibliocommons.com/list/share/715647058_st8ceyw8/1365437547_recommendations_for_teen_girls
https://epl.bibliocommons.com/list/share/1131250757_lvccld_saharawest/1363494177_geeks_rule_books_for_teens
https://epl.bibliocommons.com/list/share/548538121_spl_merley/1358151383_help_for_anxious_teens
https://epl.bibliocommons.com/list/share/679797892_dbrl_idaf/1355664913_matryoshka_fiction
https://epl.bibliocommons.com/list/share/1315907392_indypl_kirstenw/1315916377_staff_recommendations_great_reads_for_teens
https://epl.bibliocommons.com/list/share/1303998627_tigard_teens/1351425041_put_a_heart_on_it
https://epl.bibliocommons.com/list/share/515946100_tacomalibrary/1343962909_a_book_about_books,_as_part_of_the_extreme_reader_challenge
https://epl.bibliocommons.com/list/share/1216909347_anna_libraryt/1342688089_ya_with_geek_themes
https://epl.bibliocommons.com/list/share/1282688857_indypl_katieb/1285699927_nanowrimo-_a_survival_guide
https://epl.bibliocommons.com/list/share/104627853_princetonpl/1333071229_libfaves
https://epl.bibliocommons.com/list/share/550038607_spl_brittany/1329175977_fresh_starts,_new_beginnings_and_second_chances
https://epl.bibliocommons.com/list/share/710260400_annag_kcmo/1322113517_fandoms
https://epl.bibliocommons.com/list/share/558294898_jclemilyd/1326533547_monticello_youth_services_recommendsya_books
https://epl.bibliocommons.com/list/share/429022740_loganlib_meg/1324424287_2019_reading_challenge
https://epl.bibliocommons.com/list/share/95681271_samcmar/1318184807_mpl_2019_reading_challenge_-_a_one_word_title
https://epl.bibliocommons.com/list/share/768705057_dcpl_teens/1322057871_if_you_like_dumplin
https://epl.bibliocommons.com/list/share/803717002_adult_custom_reading_list/1321396267_omaha_custom_list_page-turners_122018
https://epl.bibliocommons.com/list/share/134340301_vpl_booksjustforyou/1160285087_vpl_-_books_just_for_you_-_fun_reads
https://epl.bibliocommons.com/list/share/1303998627_tigard_teens/1320248908_do_you_ship_them
https://epl.bibliocommons.com/list/share/768705057_dcpl_teens/1030069518_a_fandom_life_for_me
https://epl.bibliocommons.com/list/share/1066057257_mcpl_readerslounge/1314212917_woodneath_staff_picks_babysitters_club_reads
https://epl.bibliocommons.com/list/share/1081387957_pacl_teens/1313796687_tlab_recommends_romance_for_teens
https://epl.bibliocommons.com/list/share/768695927_dcpl_adults/1311059977_dcpl_staff_picks_for_2018
https://epl.bibliocommons.com/list/share/186066773_jclemmaf/1313674757_ya_books_about_teen_writers
https://epl.bibliocommons.com/list/share/888940897_cmlibrary_corvolunteens/1306009547_calians_favorites
https://epl.bibliocommons.com/list/share/344916587_chapel_hill_teenstaff/687974851_unusual_formats
https://epl.bibliocommons.com/list/share/1204935759_jclmegb/1303553797_teen_reads_to_tickle_your_funny_bone_amp_warm_your_heart
https://epl.bibliocommons.com/list/share/95796007_jessicagma/1302711427_book_smack_j%C3%B3lab%C3%B3kafl%C3%B3%C3%B0i%C3%B0_2018_jessica
https://epl.bibliocommons.com/list/share/219559045_kclsaarene/1302650609_best-selling_nanowrimo_winners
https://epl.bibliocommons.com/list/share/569520567_hholley/710149067_opl_staff_picks
https://epl.bibliocommons.com/list/share/491055517_cals_readers/1298323449_nanowrimo_books_that_got_published
https://epl.bibliocommons.com/list/share/73877511_jcltracim/1296589167_nanowrimo_-_published_wrimos
https://epl.bibliocommons.com/list/share/219559045_kclsaarene/1296304497_pizza_and_books_einstein_ms_november_2018
https://epl.bibliocommons.com/list/share/104627853_princetonpl/1295497427_nanowrimo
https://epl.bibliocommons.com/list/share/675410617_orlreads/1295410127_orl_recommends_-_nanowrimo_reads
https://epl.bibliocommons.com/list/share/768705057_dcpl_teens/1294054347_family_stories
https://epl.bibliocommons.com/list/share/1165043747_sppl_teens/1282475677_lets_talk_about_mental_health
https://epl.bibliocommons.com/list/share/685936385_arapahoebridget/723765118_breaking_out_of_nanowrimo_writers_block
https://epl.bibliocommons.com/list/share/1106377937_mckenzingtonc/1277464857_disability_awareness
https://epl.bibliocommons.com/list/share/105396413_youthcollection/1260776227_fall_2018_must-read_ya_novels
https://epl.bibliocommons.com/list/share/105396413_youthcollection/1261651207_ya_books_about_social_anxiety
https://epl.bibliocommons.com/list/share/1244999997_jcls_youth_services/1259372807_libraries_rock_talent_teen_five_star_books
https://epl.bibliocommons.com/list/share/79828372_vpl_informationservice/1254087617_vpl_-_new_adult_fiction
https://epl.bibliocommons.com/list/share/308506797_kclsreads/1253264637_to_all_the_boys_ive_loved_before

Upvotes: 0

CEH
CEH

Reputation: 5909

I've scraped your page and written an XPath that will find all of the li elements under 'Staff Lists that include this title'. Updated to include a wait for all relevant li elements to be present.

WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPath, "//div[h4[text()='Staff Lists that include this Title']]/div[2]/ul/li[@class='']")))
driver.find_elements_by_xpath("//div[h4[text()='Staff Lists that include this Title']]/div[2]/ul/li[not(contains(@class, 'extra'))]")

This XPath queries the main div element that contains all li items under h4 element containing text 'Staff Lists that include this title'. Then we query div[2] which contains the relevant li items. The final query is on li elements with EMPTY classname. As we can see from the page source, there are many hidden li elements with class='extra' attribute. We do not want these li elements, so we query on not(contains(@class=, 'extra')) to get us li elements with no extra classname.

If the above XPath does not work, I also modified another XPath that you posted in your original problem:

WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPath, "//*[@id="rightBar"]/div[3]/div/div[2]/ul/li[not(contains(@class, 'extra'))]")))
driver.find_elements_by_xpath("//*[@id="rightBar"]/div[3]/div/div[2]/ul/li[not(contains(@class, 'extra'))]")

For the URL you provided, both queries retrieved 5 results:

XPath query

Upvotes: 1

Related Questions