Reputation: 1465
I have the following data sample:
{"rates":{
"IT":{
"country_name":"Italy",
"standard_rate":20,
"reduced_rates":{
"food":13,
"books":11
}
},
"UK":{
"country_name":"United Kingdom",
"standard_rate":21,
"reduced_rates":{
"food":12,
"books":1
}
}
}}
The IT
, UK
are countries code and they can be changed. Every time I sample the data there might different key. There isn't a constant key name that I can relay on.
I have the following code that creates the dataframe:
df = pd.DataFrame(columns=['code', 'country_name')
for k,item in dic['rates'].items():
df = df.append( {'code': k, 'country_name': item['country_name']} , ignore_index=True)
This gives me:
code country_name
0 IT Italy
1 UK United Kingdom
Now, while this works the docs https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.append.html report that this is inefficient usage.
The docs suggest to use:
pd.concat([pd.DataFrame([i], columns=['A']) for i in range(5)], ignore_index=True)
So I tried to do:
new = pd.concat([pd.DataFrame([item], columns=['code', 'country_name']) for k,item in dic['rates'].items()], ignore_index=True)
However this gives:
code country_name
0 NaN Italy
1 NaN United Kigdom
I understand that this happens since there is no actual key in the sample that called code
this is just a name that I assigned to the column in the dataframe but I don't know how to fix this.
Suggestions?
Upvotes: 0
Views: 135
Reputation: 344
It seems you can easily accomplish the outcome you're looking for using inbuilt pandas functionality.
df = pd.DataFrame.from_dict(dic["rates"])
This gives a transposed version of what you're looking for. This can be solved by:
df = df.T
This will yield the correct form, but with the country codes as indexes.
df = df.reset_index()
df = df.rename(index=str, columns={"index": "country_code"})
It also includes the other data in the dictionary, which you may or may not want. You can either use the drop function, or more simply:
df = df[["country_code", "country_name"]]
Keep in mind that the first 3 above at least can be condensed into one line of code.
I assume that taking advantage of actual pandas functionality is more efficient and is likely preferable to iterating through the dict items. I'd suggest testing on larger datasets to see how different methods scale since in general, pandas overhead will make it perform worse on small datasets but scale well.
Upvotes: 1
Reputation: 82765
Using a list comprehension
Ex:
import pandas as pd
dic = {"rates":{
"IT":{
"country_name":"Italy",
"standard_rate":20,
"reduced_rates":{
"food":13,
"books":11
}
},
"UK":{
"country_name":"United Kingdom",
"standard_rate":21,
"reduced_rates":{
"food":12,
"books":1
}
}
}}
df = pd.DataFrame([{'code': k, 'country_name': v["country_name"]} for k,v in dic["rates"].items()])
print(df)
Output:
code country_name
0 IT Italy
1 UK United Kingdom
Upvotes: 1