Reputation: 25629
Python Pandas : How to compile all lists in a column into one unique list
Starting with data from previous question:
f = pd.DataFrame({'id':['a','b', 'a'], 'val':[['val1','val2'],
['val33','val9','val6'],
['val2','val6','val7']]})
print (df)
id val
0 a [val1, val2]
1 b [val33, val9, val6]
2 a [val2, val6, val7]
How do I get the lists into Dict:
pd.Series([a for b in df.val.tolist() for a in b]).value_counts().to_dict()
{'val1': 1, 'val2': 2, 'val33': 1, 'val6': 2, 'val7': 1, 'val9': 1}
How do I get the lists by groups:
df.groupby('id')["val"].apply(lambda x: (list([a for b in x.tolist() for a in b]))
)
id
a [val1, val2, val2, val6, val7]
b [val33, val9, val6]
Name: val, dtype: object
How do I get the lists by groups as dicts:
df.groupby('id')["val"].apply(lambda x: pd.Series([a for b in x.tolist() for a in b]).value_counts().to_dict() )
Returns:
id
a val1 1.0
val2 2.0
val6 1.0
val7 1.0
b val33 1.0
val6 1.0
val9 1.0
Name: val, dtype: float64
Desired output What am I overlooking? :
id
a {'val1': 1, 'val2': 2, 'val6': 2, 'val7': 1}
b {'val33': 1, 'val6': 1, 'val9': 1}
Name: val, dtype: object
Upvotes: 1
Views: 502
Reputation: 420
Edited using agg
from @ayhan (much faster than apply).
from collections import Counter
df.groupby("id")["val"].agg(lambda x: Counter([a for b in x for a in b]))
Out:
id
a {'val2': 2, 'val6': 1, 'val7': 1, 'val1': 1}
b {'val9': 1, 'val33': 1, 'val6': 1}
Name: val, dtype: object
Time of this version:
%timeit df.groupby("id")["val"].agg(lambda x: Counter([a for b in x for a in b]))
1000 loops, best of 3: 820 µs per loop
Time of @ayhan version:
%timeit df.groupby('id')["val"].agg(lambda x: pd.Series([a for b in x.tolist() for a in b]).value_counts().to_dict() )
100 loops, best of 3: 1.91 ms per loo
Upvotes: 1
Reputation:
Apply is flexible. Whenever possible, it converts the returning object to something that is more usable. From the docs:
Some operations on the grouped data might not fit into either the aggregate or transform categories. Or, you may simply want GroupBy to infer how to combine the results. For these, use the apply function, which can be substituted for both aggregate and transform in many standard use cases.
Note: apply can act as a reducer, transformer, or filter function, depending on exactly what is passed to apply. So depending on the path taken, and exactly what you are grouping. Thus the grouped columns(s) may be included in the output as well as set the indices.
There can be cases, like this, that you want to avoid this behavior. If you are grouping, simply replace apply with agg:
df.groupby('id')["val"].agg(lambda x: pd.Series([a for b in x.tolist() for a in b]).value_counts().to_dict() )
Out:
id
a {'val1': 1, 'val7': 1, 'val6': 1, 'val2': 2}
b {'val6': 1, 'val33': 1, 'val9': 1}
Name: val, dtype: object
Upvotes: 1