Reputation: 374
I am trying to extract the property "og" from opengraph from a website. What I want is to have all the tags that start with "og" of the document in a list.
What I've tried is:
soup.find_all("meta", property="og:")
and
soup.find_all("meta", property="og")
But it does not find anything unless I specify the complete tag.
A few examples are:
<meta content="https://www.youtube.com/embed/Rv9hn4IGofM" property="og:video:url"/>,
<meta content="https://www.youtube.com/embed/Rv9hn4IGofM" property="og:video:secure_url"/>,
<meta content="text/html" property="og:video:type"/>,
<meta content="1280" property="og:video:width"/>,
<meta content="720" property="og:video:height"/>
Expected output would be:
l = ["og:video:url", "og:video:secure_url", "og:video:type", "og:video:width", "og:video:height"]
How can I do this?
Thank you
Upvotes: 1
Views: 475
Reputation: 6554
use CSS selector meta[property]
metas = soup.select('meta[property]')
propValue = [v['property'] for v in metas]
print(propValue)
Upvotes: 2
Reputation: 20098
You can check if og
exist in property
as follows:
...
soup = BeautifulSoup(html, "html.parser")
og_elements = [
tag["property"] for tag in soup.find_all("meta", property=lambda t: "og" in t)
]
print(og_elements)
Upvotes: 1
Reputation: 20050
Is this what you want?
from bs4 import BeautifulSoup
sample = """
<html>
<body>
<meta content="https://www.youtube.com/embed/Rv9hn4IGofM" property="og:video:url"/>,
<meta content="https://www.youtube.com/embed/Rv9hn4IGofM" property="og:video:secure_url"/>,
<meta content="text/html" property="og:video:type"/>,
<meta content="1280" property="og:video:width"/>,
<meta content="720" property="og:video:height"/>
</body>
</html>
"""
print([m["property"] for m in BeautifulSoup(sample, "html.parser").find_all("meta")])
Output:
['og:video:url', 'og:video:secure_url', 'og:video:type', 'og:video:width', 'og:video:height']
Upvotes: 1