Reputation: 640
I would like to calculate the mean across multiple columns using groupby
. Below is a toy example
df = pd.DataFrame({'company': ['dell', 'microsoft', 'toshiba', 'apple'],
'measure': ['sales', 'speed', 'wait time', 'service'], 'category': ['laptop',
'tablet', 'smartphone', 'desktop'], '10/6/2015': [234, 333, 456, 290],
'10/13/2015': [134, 154, 123, 177], '10/20/2015': [57, 57, 63, 71]})
I would like to calculate the average for each row across the date columns in df
. I figured that the best way to use groupby
would be to change the column names so that they are non-unique for each month, like so:
def maybe_rename(col_name):
if re.match('\\d+/\\d+/\\d+', col_name):
return re.split('/', col_name)[0] + re.split('/', col_name)[2]
else:
return col_name
df = df.rename(columns = maybe_rename)
df
company measure category 102015 102015 102015
0 dell sales laptop 234 134 57
1 microsoft speed tablet 333 154 57
2 toshiba wait time smartphone 456 123 63
3 apple service desktop 290 177 71
Then I tried to compute the mean
like so:
df = df.groupby(df.columns, axis = 1).mean()
Which returned the following error: DataError: No numeric types to aggregate
How do I get around this? My desired result is below:
df
company measure category 102015
0 dell sales laptop 141.66
1 microsoft speed tablet 181.33
2 toshiba wait time smartphone 214.0
3 apple service desktop 79.33
Upvotes: 0
Views: 707
Reputation: 338
Try this:
import pandas as pd
df = pd.DataFrame({'company': ['dell', 'microsoft', 'toshiba', 'apple'],
'measure': ['sales', 'speed', 'wait time', 'service'], 'category': ['laptop',
'tablet', 'smartphone', 'desktop'], '10/6/2015': [234, 333, 456, 290],
'10/13/2015': [134, 154, 123, 177], '10/20/2015': [57, 57, 63, 71]})
columns_to_average = ['10/6/2015','10/20/2015','10/13/2015']
df['means'] = df[columns_to_average].mean(axis=1)
I would suggest transforming it into time series data if you have many date columns...
tdf = df[['category','10/6/2015','10/20/2015','10/13/2015']].transpose()
tdf = tdf.rename(columns=tdf.iloc[0]).drop(tdf.index[0])
print(tdf['laptop'].mean())
Upvotes: 1