TylerNG
TylerNG

Reputation: 941

Pandas group to others

I have a question about the syntax for grouping "other". For example,

df

Type  Start  End Count Total
A     x      a   1     3
A     x      b   1     3
A     x      c   1     3
A     y      A   2     4
A     y      b   1     4
A     y      c   1     4
B     x      A   1     6
B     x      b   2     6
B     x      c   3     6
B     y      a   3     6
B     y      b   2     6
B     y      c   1     6

Group by Type/Start/End columns, if End doesn't contain "a" or "A" then label them as "other"

Type  Start  End   Count Total
A     x      a     1     3
A     x      other 2     3
A     y      A     2     4
A     y      other 2     4
B     x      A     1     6
B     x      other 5     6
B     y      a     3     6
B     y      other 3     6

Upvotes: 1

Views: 572

Answers (3)

jezrael
jezrael

Reputation: 862406

I think you need replace all values not contains a and A to other by where and condition by isin and then use groupby by columns and Series s:

s = df['End'].where(df['End'].isin(['a','A']), 'other')
print (s)
0         a
1     other
2     other
3         A
4     other
5     other
6         A
7     other
8     other
9         a
10    other
11    other
Name: End, dtype: object    

df = (df.groupby(['Type', 'Start', s])
        .agg({'Count':'sum', 'Total':'mean'})
        .reset_index())

Another similar solution is replace column End and use your original solution for groupby + agg:

df['End'] = np.where(df['End'].isin(['a','A']), df['End'], 'other')
#alternatively
#df['End'] = df['End'].where(df['End'].isin(['a','A']), 'other')
df = (df.groupby(['Type', 'Start', 'End'], as_index=False)
        .agg({'Count':'sum', 'Total':'mean'}))

print (df)
  Type Start    End  Count  Total
0    A     x      a      1      3
1    A     x  other      2      3
2    A     y      A      2      4
3    A     y  other      2      4
4    B     x      A      1      6
5    B     x  other      5      6
6    B     y      a      3      6
7    B     y  other      3      6

Upvotes: 2

cs95
cs95

Reputation: 402263

You were almost there. The first two parameters to groupby are fine, but the last needs modification.

f = {'Count': 'sum', 'Total' : 'mean'}   
v = df.End.where(df.End.isin(['a', 'A']), 'other')

df.groupby(['Type', 'Start', v]).agg(f).reset_index()

  Type Start    End  Total  Count
0    A     x      a      3      1
1    A     x  other      3      2
2    A     y      A      4      2
3    A     y  other      4      2
4    B     x      A      6      1
5    B     x  other      6      5
6    B     y      a      6      3
7    B     y  other      6      3

Details

Use where/mask to alter df.End's values accordingly;

v = df.End.where(df.End.isin(['a', 'A']), 'other')

Or,

v = df.End.mask(~df.End.isin(['a', 'A']), 'other')

v

0         a
1     other
2     other
3         A
4     other
5     other
6         A
7     other
8     other
9         a
10    other
11    other
Name: End, dtype: object

Alternatively, lowercase the column and compare.

v = df.End.where(df.End.str.lower().eq('a'), 'other')

The rest, as they say, is history. If you're interested in preserving column order, slap on a reindex call at the end.

df.groupby(['Type', 'Start', v])\
  .agg(f)\
  .reset_index()\
  .reindex(columns=df.columns.tolist())

  Type Start    End  Count  Total
0    A     x      a      1      3
1    A     x  other      2      3
2    A     y      A      2      4
3    A     y  other      2      4
4    B     x      A      1      6
5    B     x  other      5      6
6    B     y      a      3      6
7    B     y  other      3      6

Upvotes: 2

Grr
Grr

Reputation: 16079

You can change the value of the entries in End to reflect your desired change and use groupby as you already described.

df.loc[~df.End.isin(['A', 'a']), 'End'] = 'other'
df.groupby(['Type','Start','End'']).agg({'Count':'sum','Tota‌​l':'mean'})

Upvotes: 2

Related Questions