Bastien
Bastien

Reputation: 382

long-format pandas dataframe to dictionary

While I find help and documentation on how to convert a pandas DataFrame to dictionary so that columns are keys and values are rows, I find myself stuck when I would like to have one of the column's values as keys and the associated values from another column as values, so that a df like this

a b
1 car
1 train
2 boot
2 computer
2 lipstick

converts to the following dictionary {'1': ['car','train'], '2': ['boot','computer','lipstick]}

I have a feeling it's something pretty simple but I'm out of ideas. I tried df.groupby('a').to_dict() but was unsuccessful

Any suggestions?

Upvotes: 1

Views: 1395

Answers (3)

unutbu
unutbu

Reputation: 880339

You could view this as a groupby-aggregation (i.e., an operation which turns each group into one value -- in this case a list):

In [85]: df.groupby(['a'])['b'].agg(lambda grp: list(grp))
Out[85]: 
a
1                  [car, train]
2    [boot, computer, lipstick]
dtype: object

In [68]: df.groupby(['a'])['b'].agg(lambda grp: list(grp)).to_dict()
Out[68]: {1: ['car', 'train'], 2: ['boot', 'computer', 'lipstick']}

Upvotes: 2

Anzel
Anzel

Reputation: 20563

Yes, because DataFrameGroupBy has no attribute of to_dict, only DataFrame has to_dict attribute.

DataFrame.to_dict(outtype='dict') Convert DataFrame to dictionary.

You can read more about DataFrame.to_dict here

Take a look of this:

import pandas as pd

df = pd.DataFrame([np.random.sample(9), np.random.sample(9)])
df.columns = [c for c in 'abcdefghi']
# it will convert the DataFrame to dict, with {column -> {index -> value}}
df.to_dict()
{'a': {0: 0.53252618404947039, 1: 0.78237275521385163},
 'b': {0: 0.43681232450879315, 1: 0.31356312459390356},
 'c': {0: 0.84648298651737541, 1: 0.81417040486070058},
 'd': {0: 0.48419015448536995, 1: 0.37578177386187273},
 'e': {0: 0.39840348154035421, 1: 0.35367537180764919},
 'f': {0: 0.050381560155985827, 1: 0.57080653289506755},
 'g': {0: 0.96491634442628171, 1: 0.32844653606404517},
 'h': {0: 0.68201236712813085, 1: 0.0097104037581828839},
 'i': {0: 0.66836630467152902, 1: 0.69104505886376366}}

type(df)
pandas.core.frame.DataFrame

# DataFrame.groupby is another type
type(df.groupby('a'))
pandas.core.groupby.DataFrameGroupBy

df.groupby('a').to_dict()
AttributeError: Cannot access callable attribute 'to_dict' of 'DataFrameGroupBy' objects, try using the 'apply' method

Upvotes: 1

Ajean
Ajean

Reputation: 5659

You can't perform a to_dict() on a the result of groupby, but you can use it to perform your own dictionary construction. The following code will work with the example you provided.

import pandas as pd

df = pd.DataFrame(dict(a=[1,1,2,2,2],
                       b=['car', 'train', 'boot', 'computer', 'lipstick']))
# Using a loop
dt = {}
for g, d in df.groupby('a'):
    dt[g] = d['b'].values

# Using dictionary comprehension
dt2 = {g: d['b'].values for g, d in df.groupby('a')}

Now both dt and dt2 will be dictionaries like this:

{1: array(['car', 'train'], dtype=object),
 2: array(['boot', 'computer', 'lipstick'], dtype=object)}

Of course you can put the numpy arrays back into lists, if you so desire.

Upvotes: 1

Related Questions