Reputation: 941
I have a question about the syntax for grouping "other". For example,
df
Type Start End Count Total
A x a 1 3
A x b 1 3
A x c 1 3
A y A 2 4
A y b 1 4
A y c 1 4
B x A 1 6
B x b 2 6
B x c 3 6
B y a 3 6
B y b 2 6
B y c 1 6
Group by Type/Start/End columns, if End doesn't contain "a" or "A" then label them as "other"
Type Start End Count Total
A x a 1 3
A x other 2 3
A y A 2 4
A y other 2 4
B x A 1 6
B x other 5 6
B y a 3 6
B y other 3 6
Upvotes: 1
Views: 572
Reputation: 862406
I think you need replace all values not contains a
and A
to other
by where
and condition by isin
and then use groupby
by columns and Series s
:
s = df['End'].where(df['End'].isin(['a','A']), 'other')
print (s)
0 a
1 other
2 other
3 A
4 other
5 other
6 A
7 other
8 other
9 a
10 other
11 other
Name: End, dtype: object
df = (df.groupby(['Type', 'Start', s])
.agg({'Count':'sum', 'Total':'mean'})
.reset_index())
Another similar solution is replace column End
and use your original solution for groupby
+ agg
:
df['End'] = np.where(df['End'].isin(['a','A']), df['End'], 'other')
#alternatively
#df['End'] = df['End'].where(df['End'].isin(['a','A']), 'other')
df = (df.groupby(['Type', 'Start', 'End'], as_index=False)
.agg({'Count':'sum', 'Total':'mean'}))
print (df)
Type Start End Count Total
0 A x a 1 3
1 A x other 2 3
2 A y A 2 4
3 A y other 2 4
4 B x A 1 6
5 B x other 5 6
6 B y a 3 6
7 B y other 3 6
Upvotes: 2
Reputation: 402263
You were almost there. The first two parameters to groupby
are fine, but the last needs modification.
f = {'Count': 'sum', 'Total' : 'mean'}
v = df.End.where(df.End.isin(['a', 'A']), 'other')
df.groupby(['Type', 'Start', v]).agg(f).reset_index()
Type Start End Total Count
0 A x a 3 1
1 A x other 3 2
2 A y A 4 2
3 A y other 4 2
4 B x A 6 1
5 B x other 6 5
6 B y a 6 3
7 B y other 6 3
Details
Use where
/mask
to alter df.End
's values accordingly;
v = df.End.where(df.End.isin(['a', 'A']), 'other')
Or,
v = df.End.mask(~df.End.isin(['a', 'A']), 'other')
v
0 a
1 other
2 other
3 A
4 other
5 other
6 A
7 other
8 other
9 a
10 other
11 other
Name: End, dtype: object
Alternatively, lowercase the column and compare.
v = df.End.where(df.End.str.lower().eq('a'), 'other')
The rest, as they say, is history. If you're interested in preserving column order, slap on a reindex
call at the end.
df.groupby(['Type', 'Start', v])\
.agg(f)\
.reset_index()\
.reindex(columns=df.columns.tolist())
Type Start End Count Total
0 A x a 1 3
1 A x other 2 3
2 A y A 2 4
3 A y other 2 4
4 B x A 1 6
5 B x other 5 6
6 B y a 3 6
7 B y other 3 6
Upvotes: 2
Reputation: 16079
You can change the value of the entries in End
to reflect your desired change and use groupby
as you already described.
df.loc[~df.End.isin(['A', 'a']), 'End'] = 'other'
df.groupby(['Type','Start','End'']).agg({'Count':'sum','Total':'mean'})
Upvotes: 2