Merlin
Merlin

Reputation: 25629

Python Pandas : How to return grouped lists in a column as a dict

Python Pandas : How to compile all lists in a column into one unique list

Starting with data from previous question:

f = pd.DataFrame({'id':['a','b', 'a'], 'val':[['val1','val2'],
                                               ['val33','val9','val6'],
                                               ['val2','val6','val7']]})

print (df)
  id                  val
0  a         [val1, val2]
1  b  [val33, val9, val6]
2  a   [val2, val6, val7]

How do I get the lists into Dict:

pd.Series([a for b in df.val.tolist() for a in b]).value_counts().to_dict()
{'val1': 1, 'val2': 2, 'val33': 1, 'val6': 2, 'val7': 1, 'val9': 1}

How do I get the lists by groups:

df.groupby('id')["val"].apply(lambda x: (list([a for b in x.tolist() for a in b])) )

id
a    [val1, val2, val2, val6, val7]
b               [val33, val9, val6]
Name: val, dtype: object

How do I get the lists by groups as dicts:

df.groupby('id')["val"].apply(lambda x: pd.Series([a for b in x.tolist() for a in b]).value_counts().to_dict() )

Returns:

id       
a   val1     1.0
    val2     2.0
    val6     1.0
    val7     1.0
b   val33    1.0
    val6     1.0
    val9     1.0
Name: val, dtype: float64

Desired output What am I overlooking? :

   id
   a     {'val1': 1, 'val2': 2, 'val6': 2, 'val7': 1}
   b     {'val33': 1, 'val6': 1,  'val9': 1}
   Name: val, dtype: object

Upvotes: 1

Views: 502

Answers (2)

Javier
Javier

Reputation: 420

Edited using agg from @ayhan (much faster than apply).

from collections import Counter
df.groupby("id")["val"].agg(lambda x: Counter([a for b in x for a in b]))

Out:

id
a    {'val2': 2, 'val6': 1, 'val7': 1, 'val1': 1}
b              {'val9': 1, 'val33': 1, 'val6': 1}
Name: val, dtype: object

Time of this version:

%timeit df.groupby("id")["val"].agg(lambda x: Counter([a for b in x for a in b]))

1000 loops, best of 3: 820 µs per loop

Time of @ayhan version:

%timeit  df.groupby('id')["val"].agg(lambda x: pd.Series([a for b in x.tolist() for a in b]).value_counts().to_dict() )

100 loops, best of 3: 1.91 ms per loo

Upvotes: 1

user2285236
user2285236

Reputation:

Apply is flexible. Whenever possible, it converts the returning object to something that is more usable. From the docs:

Some operations on the grouped data might not fit into either the aggregate or transform categories. Or, you may simply want GroupBy to infer how to combine the results. For these, use the apply function, which can be substituted for both aggregate and transform in many standard use cases.

Note: apply can act as a reducer, transformer, or filter function, depending on exactly what is passed to apply. So depending on the path taken, and exactly what you are grouping. Thus the grouped columns(s) may be included in the output as well as set the indices.

There can be cases, like this, that you want to avoid this behavior. If you are grouping, simply replace apply with agg:

df.groupby('id')["val"].agg(lambda x: pd.Series([a for b in x.tolist() for a in b]).value_counts().to_dict() )
Out: 
id
a    {'val1': 1, 'val7': 1, 'val6': 1, 'val2': 2}
b              {'val6': 1, 'val33': 1, 'val9': 1}
Name: val, dtype: object

Upvotes: 1

Related Questions