Reputation: 1679
I have a df like this:
X y
a 0
a 0
a 0
a 1
b 1
b 1
a 2
b 2
c 2
I want to group df
by df.y
. In this grouping, I want to aggregate df.X
in a specific way:
It should look like this:
X y
a 0
b 1
[b,c] 2
When I run:
mask1 = s[['b','c']].ge(0.25).any(1)
s1 = np.where(s['b']==s['c'], {'b','c'}, s[['b','c']].idxmax(1))
pd.Series(np.where(mask1, s1, 'a'), index=s.index)
I get this error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-162-c6e6610b3c30> in <module>
----> 1 mask1 = s[['b','c']].ge(0.25).any(1)
2
3 s1 = np.where(s['b']==s['c'], {'b','c'}, s[['b','c']].idxmax(1))
4
5 pd.Series(np.where(mask1, s1, 'a'), index=s.index)
~/anaconda3/envs/cv2/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
2903 if is_iterator(key):
2904 key = list(key)
-> 2905 indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
2906
2907 # take() does not accept boolean indexers
~/anaconda3/envs/cv2/lib/python3.6/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
1252 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
1253
-> 1254 self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
1255 return keyarr, indexer
1256
~/anaconda3/envs/cv2/lib/python3.6/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
1296 if missing == len(indexer):
1297 axis_name = self.obj._get_axis_name(axis)
-> 1298 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
1299
1300 # We (temporarily) allow for some missing keys with .loc, except in
KeyError: "None of [Index(['b', 'c'], dtype='object')] are in the [columns]"
Upvotes: 0
Views: 50
Reputation: 150785
It's pretty straightforward from your logic:
s = (df.groupby('y')['X'].value_counts(normalize=True)
.unstack('X', fill_value=0)
)
mask1 = s[['b','c']].ge(0.25).any(1)
s1 = np.where(s['b']==s['c'], {'b','c'}, s[['b','c']].idxmax(1))
pd.Series(np.where(mask1, s1, 'a'), index=s.index)
Output:
y
0 a
1 b
2 {c, b}
dtype: object
Upvotes: 1