nate robo
nate robo

Reputation: 95

Pandas: Using group by, combine multiple column values as one distinct group within the groupby

I have a data-frame which I'm using the pandas.groupby on a specific column and then running aggregate statistics on the produced groups (mean, median, count). I want to treat certain column values as members of the same group produced by the groupby rather than a distinct group per distinct value in the column which was used for the grouping. I was looking how I would accomplish such a thing.

For example:

>> my_df
ID    SUB_NUM     ELAPSED_TIME
1     1           1.7
2     2           1.4
3     2           2.1 
4     4           3.0
5     6           1.8
6     6           1.2

So instead of the typical behavior:

>> my_df.groupby([SUB_NUM]).agg([count])
ID    SUB_NUM     Count
1     1           1
2     2           2
4     4           1
5     6           2 

I want certain values (SUB_NUM in [1, 2]) to be computed as one group so instead something like below is produced:

>> # Some mystery pandas function calls
ID    SUB_NUM     Count
1     1, 2        3
4     4           1
5     6           2

Any help would be much appreciated, thanks!

Upvotes: 3

Views: 2440

Answers (3)

Mykola Zotko
Mykola Zotko

Reputation: 17911

As the by argument in groupby you can use a function that accepts a dataframe's index and returns group names. Maybe it doesn't have many advantages in this example, but it is worthwhile in more complex logic:

def func(idx):
    if idx in (1, 2):
        return 1
    else:
        return idx

df.set_index('SUB_NUM', drop=False, inplace=True) # we want to groupby by this column
df.groupby(func).agg({'ID': 'first', 'SUB_NUM': [set, 'count']}).reset_index(drop=True)

Output:

     ID SUB_NUM      
  first     set count
0     1  {1, 2}     3
1     4     {4}     1
2     5     {6}     2

Upvotes: 0

jezrael
jezrael

Reputation: 863791

For me works:

#for join values convert values to string
df['SUB_NUM'] = df['SUB_NUM'].astype(str)
#create mapping dict by dict comprehension
L = ['1','2']
d = {x: ','.join(L) for x in L}
print (d)
{'2': '1,2', '1': '1,2'}

#replace values by dict
a = df['SUB_NUM'].replace(d)
print (a)
0    1,2
1    1,2
2    1,2
3      4
4      6
5      6
Name: SUB_NUM, dtype: object


#groupby by mapping column and aggregating `first` and `size`
print (df.groupby(a)
         .agg({'ID':'first', 'ELAPSED_TIME':'size'})
         .rename(columns={'ELAPSED_TIME':'Count'})
         .reset_index())

  SUB_NUM  ID  Count
0     1,2   1      3
1       4   4      1
2       6   5      2

What is the difference between size and count in pandas?

Upvotes: 5

matusko
matusko

Reputation: 3777

You can create another column mapping the SUB_NUM values to actual groups and then group by it.

my_df['SUB_GROUP'] = my_df['SUB_NUM'].apply(lambda x: 1 if x < 3 else x)
my_df.groupby(['SUB_GROUP']).agg([count])

Upvotes: 1

Related Questions