Reputation: 2623
Hi I have currently been facing issues with getting the author from a wikimedia photo. bs4 find is always returning None and I'm getting pretty stuck. I was wondering if someone code show me some code that may work.
Example wikimedia: https://commons.wikimedia.org/wiki/File:Golden_Retriever_Carlos_(10581910556).jpg
My aim is to get the authors name and its corresponding link
Current code
html_content = requests.get(url).text
soup = BeautifulSoup(html_content, "lxml")
#This return None though
table = soup.find("table", {'class': "fileinfotpl-type-information toccolours vevent mw-content-ltr"})
Upvotes: 1
Views: 82
Reputation: 195468
import requests
from bs4 import BeautifulSoup
url = 'https://commons.wikimedia.org/wiki/File:Golden_Retriever_Carlos_(10581910556).jpg'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
print(soup.select_one('td:contains("Author")').find_next('td').get_text(strip=True))
Prints:
Dirk Vorderstraße
Upvotes: 1
Reputation: 17368
from bs4 import BeautifulSoup
import requests
res = requests.get("https://commons.wikimedia.org/wiki/File:Golden_Retriever_Carlos_(10581910556).jpg")
soup = BeautifulSoup(res.text, "html.parser")
author_td = soup.find("table", class_="fileinfotpl-type-information toccolours vevent mw-content-ltr").find("tbody").find_all("tr")[-1]
print(author_td.find_all("td")[-1].get_text(strip=True))
Output:
Dirk Vorderstraße
Upvotes: 0