Luis
Luis

Reputation: 1465

How to properly assign values to dataframe from strings?

I have the following data sample:

{"rates":{
   "IT":{
     "country_name":"Italy",
     "standard_rate":20,
     "reduced_rates":{
       "food":13,
       "books":11
     }
  },

   "UK":{
     "country_name":"United Kingdom",
     "standard_rate":21,
     "reduced_rates":{
       "food":12,
       "books":1
     }
  }  
}}

The IT , UK are countries code and they can be changed. Every time I sample the data there might different key. There isn't a constant key name that I can relay on.

I have the following code that creates the dataframe:

df = pd.DataFrame(columns=['code', 'country_name')
for k,item in dic['rates'].items():
    df = df.append( {'code': k, 'country_name': item['country_name']} , ignore_index=True)

This gives me:

  code    country_name
0  IT       Italy
1  UK       United Kingdom

Now, while this works the docs https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.append.html report that this is inefficient usage.

The docs suggest to use:

pd.concat([pd.DataFrame([i], columns=['A']) for i in range(5)], ignore_index=True)

So I tried to do:

new = pd.concat([pd.DataFrame([item], columns=['code', 'country_name']) for k,item in dic['rates'].items()], ignore_index=True)

However this gives:

   code  country_name
0  NaN     Italy
1  NaN     United Kigdom

I understand that this happens since there is no actual key in the sample that called code this is just a name that I assigned to the column in the dataframe but I don't know how to fix this.

Suggestions?

Upvotes: 0

Views: 135

Answers (2)

NaT3z
NaT3z

Reputation: 344

It seems you can easily accomplish the outcome you're looking for using inbuilt pandas functionality.

df = pd.DataFrame.from_dict(dic["rates"])

This gives a transposed version of what you're looking for. This can be solved by:

df = df.T

This will yield the correct form, but with the country codes as indexes.

df = df.reset_index()
df = df.rename(index=str, columns={"index": "country_code"})

It also includes the other data in the dictionary, which you may or may not want. You can either use the drop function, or more simply:

df = df[["country_code", "country_name"]]

Keep in mind that the first 3 above at least can be condensed into one line of code.

I assume that taking advantage of actual pandas functionality is more efficient and is likely preferable to iterating through the dict items. I'd suggest testing on larger datasets to see how different methods scale since in general, pandas overhead will make it perform worse on small datasets but scale well.

Upvotes: 1

Rakesh
Rakesh

Reputation: 82765

Using a list comprehension

Ex:

import pandas as pd

dic = {"rates":{
   "IT":{
     "country_name":"Italy",
     "standard_rate":20,
     "reduced_rates":{
       "food":13,
       "books":11
     }
  },

   "UK":{
     "country_name":"United Kingdom",
     "standard_rate":21,
     "reduced_rates":{
       "food":12,
       "books":1
     }
  }  
}}

df = pd.DataFrame([{'code': k, 'country_name': v["country_name"]} for k,v in dic["rates"].items()])
print(df)

Output:

  code    country_name
0   IT           Italy
1   UK  United Kingdom

Upvotes: 1

Related Questions