Reputation: 18786
have a dataframe of the form
col1 sum
801 1
802 2
391 3
701 5
I want to groupby the initial number of col1, applying mean
basically result should be
col1 sum
8 1.5
3 3
7 5
what I have tried is
def group_condition(col1):
col1 = str(col1)
if col1.startswith('8'):
return 'y'
else:
return 'n'
augmented_error_table[[sum]].groupby(augmented_error_table[col1].groupby(group_condition).groups).mean()
But it doesn't work out, give me empty df
Upvotes: 0
Views: 114
Reputation: 30605
Use astype(str)
in groupby like .
df.groupby(df['col1'].astype(str).str[0])['sum'].mean()
Ouptut :
sum
col1
3 3.0
7 5.0
8 1.5
Upvotes: 2
Reputation: 24752
import pandas as pd
import numpy as np
df = pd.DataFrame(dict(col1=[801,802,391,701], sum=[1,2,3,5]))
# work out initial digit by list comprehension
df['init_digit'] = [str(x)[0] for x in df.col1]
# use groupby, agg function apply to sum column only
df.groupby(['init_digit']).agg({'sum':mean})
Out[23]:
sum
init_digit
3 3.0
7 5.0
8 1.5
Upvotes: 0
Reputation: 36555
I think the problem is that that groupby
actually needs a series, not a function as input, something like this
table.groupby(group_condition(table[col1]))
Upvotes: 0