mitsi
mitsi

Reputation: 1165

Convert a dict to a pandas DataFrame

My data look like this :

{u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'}

I want to convert it to a pandas Dataframe. But when I try

df = pd.DataFrame(response.items())

I get a DataFrame with two columns, the first with the first key, and the second with the values of the key:

                            0                       1 
0  "57e01311817bc367c030b390"   {"ad_since": 2016, "indoor_swimming_pool": "No...
1  "57e01311817bc367c030b3a8"   {"ad_since": 2012, "indoor_swimming_pool": "No... 

How can I get a single column for each key : "ad_since", "indoor_swimming_pool", "indoor_swimming_pool" ? And keep the first column, or get the id as index.

Upvotes: 3

Views: 2225

Answers (2)

Ami Tavory
Ami Tavory

Reputation: 76297

As the values are strings, you can use the json module and list comprehension:

In [20]: d =     {u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'}

In [21]: import json

In [22]: pd.DataFrame(dict([(k, [json.loads(e)[k] for e in d.values()]) for k in json.loads(d.values()[0])]), index=d.keys())Out[22]: 
                            ad_since handicapped_access indoor_swimming_pool  \
"57e01311817bc367c030b390"      2016                Yes                   No   
"57e01311817bc367c030b3a8"      2012                Yes                   No   

                       seaside  
"57e01311817bc367c030b390"      No  
"57e01311817bc367c030b3a8"      No  

Upvotes: 1

jezrael
jezrael

Reputation: 862511

You need convert column of type str to dict by .apply(literal_eval) or .apply(json.loads) and then use DataFrame.from_records:

import pandas as pd
from ast import literal_eval

response = {u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', 
           u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'}

df = pd.DataFrame.from_dict(response, orient='index')

print (type(df.iloc[0,0]))
<class 'str'>

df.iloc[:,0] = df.iloc[:,0].apply(literal_eval)

print (pd.DataFrame.from_records(df.iloc[:,0].values.tolist(), index=df.index))
                            ad_since handicapped_access indoor_swimming_pool  \
"57e01311817bc367c030b3a8"      2012                Yes                   No   
"57e01311817bc367c030b390"      2016                Yes                   No   

                           seaside  
"57e01311817bc367c030b3a8"      No  
"57e01311817bc367c030b390"      No  

import pandas as pd
import json

response = {u'"57e01311817bc367c030b390"': u'{"ad_since": 2016, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}', 
           u'"57e01311817bc367c030b3a8"': u'{"ad_since": 2012, "indoor_swimming_pool": "No", "seaside": "No", "handicapped_access": "Yes"}'}


df = pd.DataFrame.from_dict(response, orient='index')
df.iloc[:,0] = df.iloc[:,0].apply(json.loads)


print (pd.DataFrame.from_records(df.iloc[:,0].values.tolist(), index=df.index))
                            ad_since handicapped_access indoor_swimming_pool  \
"57e01311817bc367c030b3a8"      2012                Yes                   No   
"57e01311817bc367c030b390"      2016                Yes                   No   

                           seaside  
"57e01311817bc367c030b3a8"      No  
"57e01311817bc367c030b390"      No  

Upvotes: 2

Related Questions