Reputation: 1834
I'm working on parsing this web page.
I've got table = soup.find("div",{"class","accordions"})
to get just the fixtures list (and nothing else) however now I'm trying to loop through each match one at a time. It looks like each match starts with an article element tag <article role="article" about="/fixture/arsenal/2018-apr-01/stoke-city">
However for some reason when I try to use matches = table.findAll("article",{"role","article"})
and then print the length of matches, I get 0.
I've also tried to say matches = table.findAll("article",{"about","/fixture/arsenal"})
but get the same issue.
Is BeautifulSoup unable to parse tags, or am I just using it wrong?
Upvotes: 0
Views: 1834
Reputation: 7238
You need to pass the attributes as a dictionary. There are three ways in which you can get the data you want.
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.arsenal.com/fixtures')
soup = BeautifulSoup(r.text, 'lxml')
matches = soup.find_all('article', {'role': 'article'})
print(len(matches))
# 16
Or, this is also the same:
matches = soup.find_all('article', role='article')
But, both these methods give some extra article tags that don't have the Arsernal
fixtures. So, if you want to find them using /fixture/arsenal
you can use CSS selectors. (Using find_all
won't work, as you need a partial match)
matches = soup.select('article[about^=/fixture/arsenal]')
print(len(matches))
# 13
Also, have a look at the keyword arguments. It'll help you get what you want.
Upvotes: 0
Reputation: 904
Try this:
matches = table.findAll('article', attrs={'role': 'article'})
Upvotes: 3