Get all the data from a REST API including nested API links

Question

I am trying to retrieve JSON data from the rest API SWAPI which has information about people, films, starships and planets within the StarWars universe.

Here is my code:

total_results = []

for page_num in range(1, 7):
    # Build the URL and download the results
    url = "https://swapi.dev/api/people/?page=" + str(page_num)
    print("Downloading", url)
    response = requests.get(url)
    data = response.json()
    total_results = total_results + data['results']

print("We have", len(total_results), "total results")

SW_people_df = pd.json_normalize(total_results)
SW_people_df.head()

Here is how the dataframe looks like:

	name	height	mass	hair_color	skin_color	eye_color	birth_year	gender	species	url
0	Luke Skywalker	172	77	blond	fair	blue	19BBY	male	[]	http://swapi.dev/api/people/1/
1	C-3PO	167	75	n/a	gold	yellow	112BBY	n/a	['http://swapi.dev/api/species/2/']	http://swapi.dev/api/people/2/
2	R2-D2	96	32	n/a	white, blue	red	33BBY	n/a	['http://swapi.dev/api/species/2/']	http://swapi.dev/api/people/3/
3	Darth Vader	202	136	none	white	yellow	41.9BBY	male	[]	http://swapi.dev/api/people/4/
4	Leia Organa	150	49	brown	light	brown	19BBY	female	[]	http://swapi.dev/api/people/5/

My question:

Is it possible to retrieve the data from the API including the nested links? i.e. getting the actual JSON information from the nested links in the column SW_people_df['species'] instead of a list of links.

Thank you!

Rob Raymond · Accepted Answer

Interesting requirement

download all the people (same approach as you just more compressed code)
check each of the columns to see if it contains a link ("http")
create a dictionary of all the columns that contain a link with a dataframe that is concatenation of results of all links in that column
now have all data, so you can merge/join and analyse across each data categories

import requests
import pandas as pd

# people - pages 1 to 7
dfp = pd.concat([pd.json_normalize(requests.get(f"https://swapi.dev/api/people/?page={p}").json()["results"]) for p in range(1,7)])


# get all the related data from urls against ppl
linkeddf = {c:pd.concat([
    pd.json_normalize(requests.get(u).json()) for u in dfp[c].explode().dropna().unique()
]) for c in dfp.columns if dfp[c].explode().str.contains("http").any() and c!="url"}


# join ppl to homeworld
dfp.merge(linkeddf["homeworld"], left_on="homeworld", right_on="url", suffixes=("_person","_world"))

# what films has a skywalker been in?
(dfp.explode("films").merge(linkeddf["films"], left_on="films", right_on="url", suffixes=("_person","_film"))
 .loc[:,["name","title"]]
 .query("name.str.contains('Sky')")
)

output

                 name                    title
0      Luke Skywalker               A New Hope
17     Luke Skywalker  The Empire Strikes Back
33     Luke Skywalker       Return of the Jedi
53     Luke Skywalker      Revenge of the Sith
61   Anakin Skywalker      Revenge of the Sith
79   Anakin Skywalker       The Phantom Menace
94     Shmi Skywalker       The Phantom Menace
115  Anakin Skywalker     Attack of the Clones
123    Shmi Skywalker     Attack of the Clones

Get all the data from a REST API including nested API links

My question:

Answers (1)

output

Related Questions