Python Beautiful Soup Extracting HTML Meta Data

Question

I am getting some odd behavior that I do not quite understand. I am hoping someone can explain what is going on.

Consider this metadata:

This line successfully finds ALL "og" properties and returns a list.

opengraphs = doc.html.head.findAll(property=re.compile(r'^og'))

However, this line fails to do the same thing for the twitter cards.

twitterCards = doc.html.head.findAll(name=re.compile(r'^twitter'))

Why does the first line successfully find all the "og" (opengraph cards), but fail to find the twitter cards?

furas · Accepted Answer

Problem is name= which has special meaning. It is used to find tag name - in your code it is meta

You have to add "meta" and use dictionary with "name"

Example with different items.

from bs4 import BeautifulSoup
import re

data='''



'''

head = BeautifulSoup(data)

print(head.findAll(property=re.compile(r'^og'))) # OK
print(head.findAll(property=re.compile(r'^tw'))) # OK

print(head.findAll(name=re.compile(r'^meta'))) # OK
print(head.findAll(name=re.compile(r'^tw')))   # empty

print(head.findAll('meta', {'name': re.compile(r'^tw')})) # OK

Python Beautiful Soup Extracting HTML Meta Data

Answers (2)

Related Questions