Reputation: 22440
I've written a script in python using BeautifulSoup to get some specific urls located in the left sided bar available within a chapter titled VIDEOS BY YEAR
from a webpage. The thing is I can parse those specific urls if i use hardcoded number in my script as I've already demonstrated below. However, my intention is to grab those exact urls without using any hardcoded number in my script. In fact, I'm after any css selector
to do the same. Hope somebody out there will stretch a helping hand to accomplish this.
This is what I've tried so far:
import requests
from bs4 import BeautifulSoup
URL = "https://www.wiseowl.co.uk/videos/"
response = requests.get(URL)
soup = BeautifulSoup(response.text,"html5lib")
for item in soup.select(".woMenuList .woMenuItem a")[-7:]:
print(item['href'])
It produces the below results:
/videos/year/2011.htm
/videos/year/2012.htm
/videos/year/2013.htm
/videos/year/2014.htm
/videos/year/2015.htm
/videos/year/2016.htm
/videos/year/2017.htm
Html elements within which the urls are:
<ul class="woMenuList">
<li class="woMenuItem"><a href="/videos/year/2011.htm">2011 (19)</a></li>
<li class="woMenuItem"><a href="/videos/year/2012.htm">2012 (45)</a></li>
<li class="woMenuItem"><a href="/videos/year/2013.htm">2013 (29)</a></li>
<li class="woMenuItem"><a href="/videos/year/2014.htm">2014 (62)</a></li>
<li class="woMenuItem"><a href="/videos/year/2015.htm">2015 (25)</a></li>
<li class="woMenuItem"><a href="/videos/year/2016.htm">2016 (46)</a></li>
<li class="woMenuItem"><a href="/videos/year/2017.htm">2017 (24)</a></li>
</ul>
Btw, all the categories and links are within similar type of classes and tags that is why I get stuck. Thanks in advance to take a look into it.
Upvotes: 0
Views: 59
Reputation: 15376
You can use the *=
operator to select only links that contain the string '/videos/year'
.
import requests
from bs4 import BeautifulSoup
URL = "https://www.wiseowl.co.uk/videos/"
response = requests.get(URL)
soup = BeautifulSoup(response.text,"html5lib")
for item in soup.select(".woMenuList .woMenuItem a[href*='/videos/year']"):
print(item['href'])
Upvotes: 1