EVS_93
EVS_93

Reputation: 191

Read json with pandas, problem with columns

I'm trying to convert a json file to a csv table using pandas.

My json file is from Game of Thrones and contains information from the characters. The problem is that each row can contain different keys of the dictionary.

I want to build a csv file with all the unique keys from every row of the json file as columns.

Here is the link to the json file

This is what I done so far:

import pandas as pd
file = link
df = pd.read_json(file)
df2 = df['characters']
#from here on I'm stuck

Upvotes: 0

Views: 806

Answers (2)

GMaster
GMaster

Reputation: 39

If you already have downloaded the json file and have saved for example with name file.json you can use this code:

import pandas as pd
import json


with open('file.json') as f:
    d = json.load(f)['characters']
    df = pd.DataFrame(d)
df.to_csv("characters.csv")

Or if you want to download json file from URL you can use this:

import pandas as pd
import json
from urllib.request import urlopen

url = "https://raw.githubusercontent.com/jeffreylancaster/game-of- 
thrones/master/data/characters.json"
response = urlopen(URL)
data_json = json.loads(response.read())

characters = data_json['characters']
df = pd.DataFrame(characters)
df.to_csv("characters.csv")

Upvotes: 1

user5386938
user5386938

Reputation:

See pandas.json_normalize.

>>> import pandas as pd
>>> path = 'characters.json'
>>> df = pd.read_json(path)
>>> df2 = pd.json_normalize(df['characters'])
>>> df2.columns
Index(['characterName', 'characterLink', 'actorName', 'actorLink', 'houseName',
       'royal', 'parents', 'siblings', 'killedBy', 'characterImageThumb',
       'characterImageFull', 'nickname', 'killed', 'servedBy', 'parentOf',
       'marriedEngaged', 'serves', 'kingsguard', 'guardedBy', 'actors',
       'guardianOf', 'allies', 'abductedBy', 'abducted', 'sibling'],
      dtype='object')
>>> df2.loc[df2['characterName']=='Sansa Stark']
    characterName          characterLink      actorName         actorLink  ... allies abductedBy abducted sibling
300   Sansa Stark  /character/ch0158137/  Sophie Turner  /name/nm3849842/  ...    NaN        NaN      NaN     NaN

[1 rows x 25 columns]
>>>
>>> df2.loc[df2['characterName']=='Sansa Stark']['siblings']
300    [Robb Stark, Arya Stark, Bran Stark, Rickon St...
Name: siblings, dtype: object
>>> df2.to_csv('got.csv', index=False)

Upvotes: 2

Related Questions