SIM
SIM

Reputation: 22440

Unable to create an appropriate selector to scrape some specific links

I've written a script in python using BeautifulSoup to get some specific urls located in the left sided bar available within a chapter titled VIDEOS BY YEAR from a webpage. The thing is I can parse those specific urls if i use hardcoded number in my script as I've already demonstrated below. However, my intention is to grab those exact urls without using any hardcoded number in my script. In fact, I'm after any css selector to do the same. Hope somebody out there will stretch a helping hand to accomplish this.

This is what I've tried so far:

import requests
from bs4 import BeautifulSoup

URL = "https://www.wiseowl.co.uk/videos/"
response = requests.get(URL)
soup = BeautifulSoup(response.text,"html5lib")
for item in soup.select(".woMenuList .woMenuItem a")[-7:]:
    print(item['href'])

It produces the below results:

/videos/year/2011.htm
/videos/year/2012.htm
/videos/year/2013.htm
/videos/year/2014.htm
/videos/year/2015.htm
/videos/year/2016.htm
/videos/year/2017.htm

Html elements within which the urls are:

<ul class="woMenuList">

    <li class="woMenuItem"><a href="/videos/year/2011.htm">2011 (19)</a></li>
    <li class="woMenuItem"><a href="/videos/year/2012.htm">2012 (45)</a></li>
    <li class="woMenuItem"><a href="/videos/year/2013.htm">2013 (29)</a></li>
    <li class="woMenuItem"><a href="/videos/year/2014.htm">2014 (62)</a></li>
    <li class="woMenuItem"><a href="/videos/year/2015.htm">2015 (25)</a></li>
    <li class="woMenuItem"><a href="/videos/year/2016.htm">2016 (46)</a></li>
    <li class="woMenuItem"><a href="/videos/year/2017.htm">2017 (24)</a></li>

</ul>

Btw, all the categories and links are within similar type of classes and tags that is why I get stuck. Thanks in advance to take a look into it.

Upvotes: 0

Views: 59

Answers (1)

t.m.adam
t.m.adam

Reputation: 15376

You can use the *= operator to select only links that contain the string '/videos/year'.

import requests
from bs4 import BeautifulSoup

URL = "https://www.wiseowl.co.uk/videos/"
response = requests.get(URL)
soup = BeautifulSoup(response.text,"html5lib")
for item in soup.select(".woMenuList .woMenuItem a[href*='/videos/year']"):
    print(item['href'])

Upvotes: 1

Related Questions