Reputation: 21
I have a json file, which i have converted to a dictionary. within the json file there are the what i call 'easy headings' of 'Year' and 'Category' which are standalone within the json file. The new column i want to create will be called 'awarded_or_not' and the data will be retrieved from entries within a dictionary headed 'Laureates' in the json file.
so far i have this to retrieve and print the two 'easy headings'...
import json
import pandas as pd
def report(nobelprizeDict):
# convert dictionary to DataFrame
df = pd.DataFrame.from_dict(nobelprizeDict)
# select columns 'year' and 'category'
res = df[['year', 'category']]
# return result
return res
with open('nobelprizes.json', 'rt') as f:
nobel = json.load(f)
df_years_categories = report(nobel)
print(df_years_categories)
for example, if i were to write res = df[['year', 'category', 'laureates']]
the 'laureates' component prints the whole entry list within the laureates dictionary into that column
i hope this makes sense and someone could correct it so i can see what i have done wrong
Upvotes: 0
Views: 89
Reputation: 2084
Here an example , I use numpy to determine if laureates has a value or not , later added a column with value if laureates is True or False...notice that you add nobelprizeDict['prizes'] (in my case):
import json
import pandas as pd
import numpy as np
def report(nobelprizeDict):
# convert dictionary to DataFrame{}
df = pd.DataFrame.from_dict(nobelprizeDict['prizes'])
# select columns 'year' and 'category'
res = df[['year', 'category', 'laureates']]
return res
with open('nobelprizes.json', 'rt') as f:
nobel = json.load(f)
df_years_categories = report(nobel)
df_years_categories['laureates'] = np.max(df_years_categories.isna(), 1).astype(bool)
df_years_categories['awarded_or_not'] = np.where(df_years_categories['laureates']==True, 'NO', 'YES')
print(df_years_categories)
Upvotes: 1