LeLoupSolitaire
LeLoupSolitaire

Reputation: 103

Get all the data from a REST API including nested API links

I am trying to retrieve JSON data from the rest API SWAPI which has information about people, films, starships and planets within the StarWars universe.

Here is my code:

total_results = []

for page_num in range(1, 7):
    # Build the URL and download the results
    url = "https://swapi.dev/api/people/?page=" + str(page_num)
    print("Downloading", url)
    response = requests.get(url)
    data = response.json()
    total_results = total_results + data['results']

print("We have", len(total_results), "total results")

SW_people_df = pd.json_normalize(total_results)
SW_people_df.head()

Here is how the dataframe looks like:

name height mass hair_color skin_color eye_color birth_year gender species url
0 Luke Skywalker 172 77 blond fair blue 19BBY male [] http://swapi.dev/api/people/1/
1 C-3PO 167 75 n/a gold yellow 112BBY n/a ['http://swapi.dev/api/species/2/'] http://swapi.dev/api/people/2/
2 R2-D2 96 32 n/a white, blue red 33BBY n/a ['http://swapi.dev/api/species/2/'] http://swapi.dev/api/people/3/
3 Darth Vader 202 136 none white yellow 41.9BBY male [] http://swapi.dev/api/people/4/
4 Leia Organa 150 49 brown light brown 19BBY female [] http://swapi.dev/api/people/5/

My question:

Is it possible to retrieve the data from the API including the nested links? i.e. getting the actual JSON information from the nested links in the column SW_people_df['species'] instead of a list of links.

Thank you!

Upvotes: 0

Views: 563

Answers (1)

Rob Raymond
Rob Raymond

Reputation: 31236

Interesting requirement

  • download all the people (same approach as you just more compressed code)
  • check each of the columns to see if it contains a link ("http")
  • create a dictionary of all the columns that contain a link with a dataframe that is concatenation of results of all links in that column
  • now have all data, so you can merge/join and analyse across each data categories
import requests
import pandas as pd

# people - pages 1 to 7
dfp = pd.concat([pd.json_normalize(requests.get(f"https://swapi.dev/api/people/?page={p}").json()["results"]) for p in range(1,7)])


# get all the related data from urls against ppl
linkeddf = {c:pd.concat([
    pd.json_normalize(requests.get(u).json()) for u in dfp[c].explode().dropna().unique()
]) for c in dfp.columns if dfp[c].explode().str.contains("http").any() and c!="url"}


# join ppl to homeworld
dfp.merge(linkeddf["homeworld"], left_on="homeworld", right_on="url", suffixes=("_person","_world"))

# what films has a skywalker been in?
(dfp.explode("films").merge(linkeddf["films"], left_on="films", right_on="url", suffixes=("_person","_film"))
 .loc[:,["name","title"]]
 .query("name.str.contains('Sky')")
)

output

                 name                    title
0      Luke Skywalker               A New Hope
17     Luke Skywalker  The Empire Strikes Back
33     Luke Skywalker       Return of the Jedi
53     Luke Skywalker      Revenge of the Sith
61   Anakin Skywalker      Revenge of the Sith
79   Anakin Skywalker       The Phantom Menace
94     Shmi Skywalker       The Phantom Menace
115  Anakin Skywalker     Attack of the Clones
123    Shmi Skywalker     Attack of the Clones

Upvotes: 1

Related Questions