Reputation: 61
I have a Json file that looks like the following. I want to grab the strings of names in the "actors" list and add them to a dataframe (which is empty now, the first item added to the dataframe would be the strings of actor names as rows).
{
"1": {
"title": "Exodus: Gods and Kings",
"url": "https://en.wikipedia.org/wiki/Exodus%3A%20Gods%20and%20Kings",
"year": "2014",
"poster": "https://upload.wikimedia.org/wikipedia/en/thumb/c/cd/Exodus2014Poster.jpg/220px-Exodus2014Poster.jpg",
"actors": [
"Christian Bale",
"Joel Edgerton",
"John Turturro",
"Aaron Paul",
"Ben Mendelsohn",
"Sigourney Weaver",
"Ben Kingsley"
]
},
...
I have tried using the following python code to do this but I am unsuccesful, I beleive because I am using a function wrong or not using the right function at all. Any suggestions as to what function/method to use?
# Create dataframe from json file
df_json = pd.read_json("movies_metadata.json", encoding='latin-1')
# Create new dataframe with actor names
data = [df.iloc[4]]
df = pd.DataFrame(data)
I strongly beleive that my code is very poor, but have had a hard time finding how to do this when googling.
Tried googling all around, as well as different methods from pandas to add items to dataframes
Upvotes: 1
Views: 549
Reputation: 11650
# read in the json file
df =pd.read_json('txt.json')
#if you have multiple json records, each will be its own columns
# filter the actor rows and then explode
df.loc['actors',:].explode()
1 Christian Bale
1 Joel Edgerton
1 John Turturro
1 Aaron Paul
1 Ben Mendelsohn
1 Sigourney Weaver
1 Ben Kingsley
2 2Christian Bale
2 2Joel Edgerton
2 2John Turturro
2 2Aaron Paul
2 2Ben Mendelsohn
2 2Sigourney Weaver
2 2Ben Kingsley
Name: actors, dtype: object
Resetting the index
df.loc['actors',:].explode().reset_index()
index actors
0 1 Christian Bale
1 1 Joel Edgerton
2 1 John Turturro
3 1 Aaron Paul
4 1 Ben Mendelsohn
5 1 Sigourney Weaver
6 1 Ben Kingsley
7 2 2Christian Bale
8 2 2Joel Edgerton
9 2 2John Turturro
10 2 2Aaron Paul
11 2 2Ben Mendelsohn
12 2 2Sigourney Weaver
13 2 2Ben Kingsley
Alternate Solution
(df[df.index.isin( ['actors','title'])]
.T
.explode('actors')
.reset_index())
index title actors
0 1 Exodus: Gods and Kings Christian Bale
1 1 Exodus: Gods and Kings Joel Edgerton
2 1 Exodus: Gods and Kings John Turturro
3 1 Exodus: Gods and Kings Aaron Paul
4 1 Exodus: Gods and Kings Ben Mendelsohn
5 1 Exodus: Gods and Kings Sigourney Weaver
6 1 Exodus: Gods and Kings Ben Kingsley
7 2 Exodus: Gods and Kings 2Christian Bale
8 2 Exodus: Gods and Kings 2Joel Edgerton
9 2 Exodus: Gods and Kings 2John Turturro
10 2 Exodus: Gods and Kings 2Aaron Paul
11 2 Exodus: Gods and Kings 2Ben Mendelsohn
12 2 Exodus: Gods and Kings 2Sigourney Weaver
13 2 Exodus: Gods and Kings 2Ben Kingsley
PS: I expanded your JSON file to hae two records in it
Upvotes: 1
Reputation: 195418
You can use list-comprehension to get actors from the dictionary and then construct a dataframe. For example:
data = {
"1": {
"title": "Exodus: Gods and Kings",
"url": "https://en.wikipedia.org/wiki/Exodus%3A%20Gods%20and%20Kings",
"year": "2014",
"poster": "https://upload.wikimedia.org/wikipedia/en/thumb/c/cd/Exodus2014Poster.jpg/220px-Exodus2014Poster.jpg",
"actors": [
"Christian Bale",
"Joel Edgerton",
"John Turturro",
"Aaron Paul",
"Ben Mendelsohn",
"Sigourney Weaver",
"Ben Kingsley",
],
}
}
df = pd.DataFrame(
[actor for v in data.values() for actor in v["actors"]], columns=["Actors"]
)
print(df)
Prints:
Actors
0 Christian Bale
1 Joel Edgerton
2 John Turturro
3 Aaron Paul
4 Ben Mendelsohn
5 Sigourney Weaver
6 Ben Kingsley
Upvotes: 2