Reputation: 7048
I am using Beautifulsoup 4.4 and python 3.6.6. I have extracted all the links however I cannot print out all links which contain
'class': ['_self']
This is the full link that is retrieved that I want to capture out of the list of links.
{'href': 'https://www.racingnsw.com.au/news/latest-racing-news/highway-sixtysix-on-right-route/', 'class': ['_self'], 'target': '_self'}
I cannot get the syntax correct although it looks like the bs4 docs on attributes.
import requests as req
import json
from bs4 import BeautifulSoup
url = req.get(
'https://www.racingnsw.com.au/media-news-premierships/latest-news/')
data = url.content
soup = BeautifulSoup(data, "html.parser")
links = soup.find_all('a')
for item in links:
print(item['class']='self')
Upvotes: 0
Views: 304
Reputation: 898
BeautifulSoup supports CSS selectors which allow you to select elements based on the content of particular attributes. This includes the selector *= for contains.
import requests as req
from bs4 import BeautifulSoup
url = req.get(
'https://www.racingnsw.com.au/media-news-premierships/latest-news/')
data = url.content
soup = BeautifulSoup(data, "html.parser")
for items in soup.select('a[class*="_self"]'):
print(items)
Upvotes: 3