Reputation: 1665
I want to make a Python list of all of Vincent van Gogh's paintings out of the JSON file from a Wikipedia API call. Here is my URL that I use to make the request:
As you can see if you open the URL in your browser, it's a huge blob of text. How can I begin to extract the titles of paintings from this massive JSON return? I have done a great deal of research before asking this question, and tried numerous methods to solve it. It would be helpful if this JSON file was a useful dictionary to work with, but I can't make sense of it. How would you extract names of paintings from this JSON file?
Upvotes: 2
Views: 4732
Reputation: 4006
Here is a quick way to have your list in a panda dataframe
import pandas as pd
url = 'http://en.wikipedia.org/wiki/List_of_works_by_Vincent_van_Gogh'
df = pd.read_html(url, attrs={"class": "wikitable"})[0] # 0 is for the 1st table in this particular page
df.head()
Upvotes: 0
Reputation: 473763
Instead of directly parsing the results of JSON API calls, use a python wrapper:
import wikipedia
page = wikipedia.page("List_of_works_by_Vincent_van_Gogh")
print page.links
There are also other clients and wrappers.
Alternatively, here's an option using BeautifulSoup
HTML parser:
>>> from bs4 import BeautifulSoup
>>> url = "http://en.wikipedia.org/wiki/List_of_works_by_Vincent_van_Gogh"
>>> soup = BeautifulSoup(urlopen(url))
>>> table = soup.find('table', class_="wikitable")
>>> for row in table.find_all('tr')[1:]:
... print(row.find_all('td')[1].text)
...
Still Life with Cabbage and Clogs
Crouching Boy with Sickle, Black chalk and watercolor
Woman Sewing, Watercolor
Woman with White Shawl
...
Upvotes: 6