Access dictionary keys and values in pandas dataframe column

Question

I've got a simple dataframe with a column populated by a python dictionary, in the form:

User                          CLang
111                      {u'en': 1}
112            {u'en': 1, u'es': 1}
112            {u'en': 1, u'es': 1}
113  {u'zh': 1, u'ja': 1, u'es': 2}
113  {u'zh': 1, u'ja': 1, u'es': 2}
113  {u'zh': 1, u'ja': 1, u'es': 2}
114                      {u'es': 1}
113  {u'zh': 1, u'ja': 1, u'es': 2}

The CLang column contains the frequency of different values for each user. How may I have access to single keys and values of the CLang column? For instance I would like to groupby the User and the most frequent value inside the dictionary, in a form like:

g = df.groupby(['User','CLang')

counting then the number of occurrences for each value:

d = g.size().unstack().fillna(0)

The resulting dataframe would appear as:

DLang  en  es
User                 
111     1   0
112     1   1
113     0   4
114     0   1

paulo.filip3 · Accepted Answer

I'm not completely sure I understood correctly what you want your output to be and also I don't think using dict in pandas.DataFrame is a very good idea in general.

Reshaping your DataFrame to something more pandas-like would be better, you would then be able to use pandas methods to solve this problem.

Anyway, if you really want to do it, here's a (not very elegant) way:

In [1]: import pandas as pd
In [2]: l1 = [111, 112, 112, 113, 113, 113, 114, 113]
In [3]: l2 = [{'en': 1},
              {'en': 1, 'es': 1},
              {'en': 1, 'es': 1},
              {'es': 2, 'ja': 1, 'zh': 1},
              {'es': 2, 'ja': 1, 'zh': 1},
              {'es': 2, 'ja': 1, 'zh': 1},
              {'es': 1},
              {'es': 2, 'ja': 1, 'zh': 1}]

In [4]: df = pd.DataFrame({'User': l1, 'CLang': l2})

In [5]: df
Out[5]: 
   User                           CLang
0   111                      {u'en': 1}
1   112            {u'en': 1, u'es': 1}
2   112            {u'en': 1, u'es': 1}
3   113  {u'zh': 1, u'ja': 1, u'es': 2}
4   113  {u'zh': 1, u'ja': 1, u'es': 2}
5   113  {u'zh': 1, u'ja': 1, u'es': 2}
6   114                      {u'es': 1}
7   113  {u'zh': 1, u'ja': 1, u'es': 2}

In [6]: def whatever(row):
  ....:     tmp_d = {}
  ....:     for d in row.values:
  ....:         for k in d.keys():
  ....:             if k in tmp_d.keys():
  ....:                 tmp_d[k] += 1
  ....:             else:
  ....:                 tmp_d[k] = 1
  ....:     return tmp_d

In [7]: new_df = df.groupby('User')['CLang'].apply(whatever).unstack().fillna(0)

In [8]: new_df
Out[8]: 
      en  es  ja  zh
User                
111    1   0   0   0
112    2   2   0   0
113    0   4   4   4
114    0   1   0   0

If you then want to know what was the CLang with more occurrences you can, also not very elegantly since list in DataFrame should be avoided, do:

In [9]: def whatever2(row):                                                  
  ....:     tmp_d = {}
  ....:     for i, v in zip(row.index, row.values):
  ....:         if v in tmp_d.keys():
  ....:             tmp_d[v].append(i)
  ....:         else:
  ....:             tmp_d[v] = [i]
  ....:     highest = max(tmp_d.keys())
  ....:     return tmp_d[highest]

In [10]: new_df['Most_Used_CLang'] = new_df.apply(whatever2, axis=1)
In [11]: new_df
Out[11]:
      en  es  ja  zh Most_Used_CLang
User                                
111    1   0   0   0            [en]
112    2   2   0   0        [en, es]
113    0   4   4   4    [es, ja, zh]
114    0   1   0   0            [es]

Access dictionary keys and values in pandas dataframe column

Answers (2)

Related Questions