Abhishek Jain
Abhishek Jain

Reputation: 630

Convert pandas df to a dictionary

I have a requirement to convert a df that is in following format:

d = {
    'A': ['a1', 'a1', 'a1', 'a1', 'a1', 'a1', 'a1', 'a2', 'a2', 'a2', 'a2', 'a2', 'a2', 'a2', 'a2'],
    'B': ['b1', 'b1', 'b1', 'b1', 'b2', 'b2', 'b2', 'b3', 'b3', 'b3', 'b3', 'b3', 'b3', 'b4', 'b4', ],
    'C': ['c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7', 'c8', 'c9', 'c10', 'c11', 'c12', 'c13', 'c14', 'c15', ],
}

df = pd.DataFrame(d)
df
    A   B   C
0   a1  b1  c1
1   a1  b1  c2
2   a1  b1  c3
3   a1  b1  c4
4   a1  b2  c5
5   a1  b2  c6
6   a1  b2  c7
7   a2  b3  c8
8   a2  b3  c9
9   a2  b3  c10
10  a2  b3  c11
11  a2  b3  c12
12  a2  b3  c13
13  a2  b4  c14
14  a2  b4  c15

to a dictionary in following format:

outDict = {
    'a1': {
        'b1': ['c1', 'c2', 'c3', 'c4'],
        'b2': ['c5', 'c6', 'c7'],
    },
    'a2': {
        'b3': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'],
        'b4': ['c14', 'c15'],
    }
}

i.e. values in column A becomes first level key; values in column B second level keys and values in column C a list.

Any pointers?

Upvotes: 5

Views: 371

Answers (4)

hilberts_drinking_problem
hilberts_drinking_problem

Reputation: 11602

A recursive solution seems pretty natural and works for any number of columns. We groupby on the leftmost column and recursively convert the remaining columns to the desired format. If only one column is left, a list is returned.

def df2dict_rec(df):
  if df.shape[1] == 1:
    return df.values[:,0].tolist()
  else:
    return {k: df2dict_rec(df_k.iloc[:,1:]) for k, df_k in df.groupby(df.columns[0])}


res = df2dict_rec(df)
# {'a1': {'b1': ['c1', 'c2', 'c3', 'c4'], 'b2': ['c5', 'c6', 'c7']},
#  'a2': {'b3': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'], 'b4': ['c14', 'c15']}}

Upvotes: 2

Sayandip Dutta
Sayandip Dutta

Reputation: 15872

This will be a little long, I dare say:


new_dict = {k: v['C'] for k,v in df.groupby(['A', 'B'])
                                 .agg(list).groupby(level=0)
                                 .apply(lambda df: df.xs(df.name)
                                 .to_dict()).to_dict().items()}

print(new_dict)

Output:

{
    'a1': {
        'b1': ['c1', 'c2', 'c3', 'c4'],
        'b2': ['c5', 'c6', 'c7'],
    },
    'a2': {
        'b3': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'],
        'b4': ['c14', 'c15'],
    }
}

Unpacked:

>>> df.groupby(['A', 'B']).agg(list)
                                  C
A  B                               
a1 b1              [c1, c2, c3, c4]
   b2                  [c5, c6, c7]
a2 b3  [c8, c9, c10, c11, c12, c13]
   b4                    [c14, c15]
>>> df.groupby(['A', 'B']).agg(list).groupby(level=0).apply(lambda df: df.xs(df.name).to_dict())
# we groupby level 0 again, then call xs as aggregator function to access each key
# in level 0, and convert to dict
A
a1    {'C': {'b1': ['c1', 'c2', 'c3', 'c4'], 'b2': [...
a2    {'C': {'b3': ['c8', 'c9', 'c10', 'c11', 'c12',...
dtype: object

>>> df.groupby(['A', 'B']).agg(list).groupby(level=0).apply(lambda df: df.xs(df.name).to_dict()).to_dict()

{'a1': {'C': {'b1': ['c1', 'c2', 'c3', 'c4'], 'b2': ['c5', 'c6', 'c7']}},
 'a2': {'C': {'b3': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'],
   'b4': ['c14', 'c15']}}}

# then just using dict comp to remove column name 'C'

Upvotes: 4

anky
anky

Reputation: 75080

Here is another way using pivot_table:

out = {k:v.dropna().to_dict() for k,v in 
      df.pivot_table('C','B','A',aggfunc=list).items()}

{'a1': {'b1': ['c1', 'c2', 'c3', 'c4'], 'b2': ['c5', 'c6', 'c7']},
 'a2': {'b3': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'], 'b4': ['c14', 'c15']}}

Upvotes: 5

Lambo
Lambo

Reputation: 1174

Any enhancements will be welcome

dictLevel1 = {}
dictLevel2 = {}
for b,c in zip(list(df['B']),list(df['C'])):
    try:
        dictLevel2[b].append(c)
    except KeyError:
        dictLevel2[b] = [c]
for a,b in zip(list(df['A']),list(df['B'])):
    try:
        dictLevel1[a].update({b:dictLevel2[b]})
    except:
        dictLevel1[a] = {b: dictLevel2[b]}
print(dictLevel1)

Output

{'a1': {'b1': ['c1', 'c2', 'c3', 'c4'], 'b2': ['c5', 'c6', 'c7']}, 'a2': {'b3': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'], 'b4': ['c14', 'c15']}}

Upvotes: 0

Related Questions