GILO
GILO

Reputation: 2623

How to get author from Wikimedia BS4

Hi I have currently been facing issues with getting the author from a wikimedia photo. bs4 find is always returning None and I'm getting pretty stuck. I was wondering if someone code show me some code that may work.

Example wikimedia: https://commons.wikimedia.org/wiki/File:Golden_Retriever_Carlos_(10581910556).jpg

Wikimedia summary

My aim is to get the authors name and its corresponding link

Current code

html_content = requests.get(url).text


soup = BeautifulSoup(html_content, "lxml")

#This return None though
table = soup.find("table", {'class': "fileinfotpl-type-information toccolours vevent mw-content-ltr"})

Upvotes: 1

Views: 82

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195468

import requests
from bs4 import BeautifulSoup

url = 'https://commons.wikimedia.org/wiki/File:Golden_Retriever_Carlos_(10581910556).jpg'    
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
print(soup.select_one('td:contains("Author")').find_next('td').get_text(strip=True))

Prints:

Dirk Vorderstraße

Upvotes: 1

bigbounty
bigbounty

Reputation: 17368

from bs4 import BeautifulSoup
import requests

res = requests.get("https://commons.wikimedia.org/wiki/File:Golden_Retriever_Carlos_(10581910556).jpg")

soup = BeautifulSoup(res.text, "html.parser")

author_td = soup.find("table", class_="fileinfotpl-type-information toccolours vevent mw-content-ltr").find("tbody").find_all("tr")[-1]

print(author_td.find_all("td")[-1].get_text(strip=True))

Output:

Dirk Vorderstraße

Upvotes: 0

Related Questions