Reputation: 95
I have a data-frame which I'm using the pandas.groupby
on a specific column and then running aggregate statistics on the produced groups (mean, median, count). I want to treat certain column values as members of the same group produced by the groupby rather than a distinct group per distinct value in the column which was used for the grouping. I was looking how I would accomplish such a thing.
For example:
>> my_df
ID SUB_NUM ELAPSED_TIME
1 1 1.7
2 2 1.4
3 2 2.1
4 4 3.0
5 6 1.8
6 6 1.2
So instead of the typical behavior:
>> my_df.groupby([SUB_NUM]).agg([count])
ID SUB_NUM Count
1 1 1
2 2 2
4 4 1
5 6 2
I want certain values (SUB_NUM in [1, 2]) to be computed as one group so instead something like below is produced:
>> # Some mystery pandas function calls
ID SUB_NUM Count
1 1, 2 3
4 4 1
5 6 2
Any help would be much appreciated, thanks!
Upvotes: 3
Views: 2440
Reputation: 17911
As the by
argument in groupby
you can use a function that accepts a dataframe's index and returns group names. Maybe it doesn't have many advantages in this example, but it is worthwhile in more complex logic:
def func(idx):
if idx in (1, 2):
return 1
else:
return idx
df.set_index('SUB_NUM', drop=False, inplace=True) # we want to groupby by this column
df.groupby(func).agg({'ID': 'first', 'SUB_NUM': [set, 'count']}).reset_index(drop=True)
Output:
ID SUB_NUM
first set count
0 1 {1, 2} 3
1 4 {4} 1
2 5 {6} 2
Upvotes: 0
Reputation: 863791
For me works:
#for join values convert values to string
df['SUB_NUM'] = df['SUB_NUM'].astype(str)
#create mapping dict by dict comprehension
L = ['1','2']
d = {x: ','.join(L) for x in L}
print (d)
{'2': '1,2', '1': '1,2'}
#replace values by dict
a = df['SUB_NUM'].replace(d)
print (a)
0 1,2
1 1,2
2 1,2
3 4
4 6
5 6
Name: SUB_NUM, dtype: object
#groupby by mapping column and aggregating `first` and `size`
print (df.groupby(a)
.agg({'ID':'first', 'ELAPSED_TIME':'size'})
.rename(columns={'ELAPSED_TIME':'Count'})
.reset_index())
SUB_NUM ID Count
0 1,2 1 3
1 4 4 1
2 6 5 2
What is the difference between size and count in pandas?
Upvotes: 5
Reputation: 3777
You can create another column mapping the SUB_NUM
values to actual groups and then group by it.
my_df['SUB_GROUP'] = my_df['SUB_NUM'].apply(lambda x: 1 if x < 3 else x)
my_df.groupby(['SUB_GROUP']).agg([count])
Upvotes: 1