AndyReifman
AndyReifman

Reputation: 1834

BeautifulSoup won't parse Article element

I'm working on parsing this web page.

I've got table = soup.find("div",{"class","accordions"}) to get just the fixtures list (and nothing else) however now I'm trying to loop through each match one at a time. It looks like each match starts with an article element tag <article role="article" about="/fixture/arsenal/2018-apr-01/stoke-city">

However for some reason when I try to use matches = table.findAll("article",{"role","article"})

and then print the length of matches, I get 0.

I've also tried to say matches = table.findAll("article",{"about","/fixture/arsenal"}) but get the same issue.

Is BeautifulSoup unable to parse tags, or am I just using it wrong?

Upvotes: 0

Views: 1834

Answers (3)

Keyur Potdar
Keyur Potdar

Reputation: 7238

You need to pass the attributes as a dictionary. There are three ways in which you can get the data you want.

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.arsenal.com/fixtures')
soup = BeautifulSoup(r.text, 'lxml')

matches = soup.find_all('article', {'role': 'article'})
print(len(matches))
# 16

Or, this is also the same:

matches = soup.find_all('article', role='article')

But, both these methods give some extra article tags that don't have the Arsernal fixtures. So, if you want to find them using /fixture/arsenal you can use CSS selectors. (Using find_all won't work, as you need a partial match)

matches = soup.select('article[about^=/fixture/arsenal]')
print(len(matches))
# 13

Also, have a look at the keyword arguments. It'll help you get what you want.

Upvotes: 0

muzzyq
muzzyq

Reputation: 904

Try this:

matches = table.findAll('article', attrs={'role': 'article'})

Upvotes: 3

internety
internety

Reputation: 374

the reason is that findAll is searching for tag name. refer to bs4 docs

Upvotes: 0

Related Questions