Reputation: 897
I am getting some odd behavior that I do not quite understand. I am hoping someone can explain what is going on.
Consider this metadata:
<meta property="og:title" content="This is the Tesla Semi truck">
<meta name="twitter:title" content="This is the Tesla Semi truck">
This line successfully finds ALL "og" properties and returns a list.
opengraphs = doc.html.head.findAll(property=re.compile(r'^og'))
However, this line fails to do the same thing for the twitter cards.
twitterCards = doc.html.head.findAll(name=re.compile(r'^twitter'))
Why does the first line successfully find all the "og" (opengraph cards), but fail to find the twitter cards?
Upvotes: 2
Views: 1405
Reputation: 142919
Problem is name=
which has special meaning. It is used to find tag name - in your code it is meta
You have to add "meta"
and use dictionary with "name"
Example with different items.
from bs4 import BeautifulSoup
import re
data='''
<meta property="og:title" content="This is the Tesla Semi truck">
<meta property="twitter:title" content="This is the Tesla Semi truck">
<meta name="twitter:title" content="This is the Tesla Semi truck">
'''
head = BeautifulSoup(data)
print(head.findAll(property=re.compile(r'^og'))) # OK
print(head.findAll(property=re.compile(r'^tw'))) # OK
print(head.findAll(name=re.compile(r'^meta'))) # OK
print(head.findAll(name=re.compile(r'^tw'))) # empty
print(head.findAll('meta', {'name': re.compile(r'^tw')})) # OK
Upvotes: 5
Reputation: 474131
This is because name
is the name of the tag name argument which basically means that in this case BeautifulSoup
would look for elements with tag names that start with twitter
.
In order to specify that you actually mean an attribute, use:
doc.html.head.find_all(attrs={'name': re.compile(r'^twitter')})
Or, via a CSS selector:
doc.html.head.select("[name^=twitter]")
where ^=
means "starts with".
Upvotes: 3