Reputation: 191
I'm trying to convert a json file to a csv table using pandas.
My json file is from Game of Thrones and contains information from the characters. The problem is that each row can contain different keys of the dictionary.
I want to build a csv file with all the unique keys from every row of the json file as columns.
Here is the link to the json file
This is what I done so far:
import pandas as pd
file = link
df = pd.read_json(file)
df2 = df['characters']
#from here on I'm stuck
Upvotes: 0
Views: 806
Reputation: 39
If you already have downloaded the json file and have saved for example with name file.json you can use this code:
import pandas as pd
import json
with open('file.json') as f:
d = json.load(f)['characters']
df = pd.DataFrame(d)
df.to_csv("characters.csv")
Or if you want to download json file from URL you can use this:
import pandas as pd
import json
from urllib.request import urlopen
url = "https://raw.githubusercontent.com/jeffreylancaster/game-of-
thrones/master/data/characters.json"
response = urlopen(URL)
data_json = json.loads(response.read())
characters = data_json['characters']
df = pd.DataFrame(characters)
df.to_csv("characters.csv")
Upvotes: 1
Reputation:
>>> import pandas as pd
>>> path = 'characters.json'
>>> df = pd.read_json(path)
>>> df2 = pd.json_normalize(df['characters'])
>>> df2.columns
Index(['characterName', 'characterLink', 'actorName', 'actorLink', 'houseName',
'royal', 'parents', 'siblings', 'killedBy', 'characterImageThumb',
'characterImageFull', 'nickname', 'killed', 'servedBy', 'parentOf',
'marriedEngaged', 'serves', 'kingsguard', 'guardedBy', 'actors',
'guardianOf', 'allies', 'abductedBy', 'abducted', 'sibling'],
dtype='object')
>>> df2.loc[df2['characterName']=='Sansa Stark']
characterName characterLink actorName actorLink ... allies abductedBy abducted sibling
300 Sansa Stark /character/ch0158137/ Sophie Turner /name/nm3849842/ ... NaN NaN NaN NaN
[1 rows x 25 columns]
>>>
>>> df2.loc[df2['characterName']=='Sansa Stark']['siblings']
300 [Robb Stark, Arya Stark, Bran Stark, Rickon St...
Name: siblings, dtype: object
>>> df2.to_csv('got.csv', index=False)
Upvotes: 2