Reputation: 105
I don't know why I'm struggling so hard with this one. I'm trying to do the excel equivalent of an averageifs calc across a pandas dataframe.
I have the following:
df = pd.DataFrame(rng.rand(1000, 7), columns=['1/31/2019', '2/28/2019', '3/31/2019', '4/30/2019', '5/31/2019', '6/30/2019', '7/31/2019'])
I also have a column:
df['Doc_Number'] = ['A', 'B', 'C', 'B', 'C', 'B', 'A', 'A', 'D', 'G', 'G', 'D', 'G', 'B' ...]
I want to do the excel equivalent of averageifs on the Doc_Number on each column of the df while maintaining the structure of the dataframe. So in each column, I'd calc the mean if df['Doc_Number'] = ['A', 'B', 'C'...] but I'd still maintain the 1,000 rows and I'd apply the calc to each individual column ['1/31/2019', '2/28/2019', '3/31/2019' ...].
For a single column, I would do something like:
df['AverageIfs'] = df.groupby('Doc_Number')['1/31/2019'].transform('np.mean')
But how would you apply the calc to each column of the df? In reality, I have many more columns to apply the calc across.
I'm a complete amateur so thanks for putting up with my questions.
Upvotes: 1
Views: 118
Reputation: 862851
You can remove ['1/31/2019']
after groupby for process all columns to new DataFramme
, change columns names with add_suffix
and add to original by join
:
#simplify df for easy check output
np.random.seed(123)
df = pd.DataFrame(np.random.rand(14, 2), columns=['1/31/2019', '2/28/2019'])
df['Doc_Number'] = ['A', 'B', 'C', 'B', 'C', 'B', 'A', 'A', 'D', 'G', 'G', 'D', 'G', 'B']
print (df)
1/31/2019 2/28/2019 Doc_Number
0 0.696469 0.286139 A
1 0.226851 0.551315 B
2 0.719469 0.423106 C
3 0.980764 0.684830 B
4 0.480932 0.392118 C
5 0.343178 0.729050 B
6 0.438572 0.059678 A
7 0.398044 0.737995 A
8 0.182492 0.175452 D
9 0.531551 0.531828 G
10 0.634401 0.849432 G
11 0.724455 0.611024 D
12 0.722443 0.322959 G
13 0.361789 0.228263 B
df = df.join(df.groupby('Doc_Number').transform('mean').add_suffix('_mean'))
print (df)
1/31/2019 2/28/2019 Doc_Number 1/31/2019_mean 2/28/2019_mean
0 0.696469 0.286139 A 0.511029 0.361271
1 0.226851 0.551315 B 0.478146 0.548364
2 0.719469 0.423106 C 0.600200 0.407612
3 0.980764 0.684830 B 0.478146 0.548364
4 0.480932 0.392118 C 0.600200 0.407612
5 0.343178 0.729050 B 0.478146 0.548364
6 0.438572 0.059678 A 0.511029 0.361271
7 0.398044 0.737995 A 0.511029 0.361271
8 0.182492 0.175452 D 0.453474 0.393238
9 0.531551 0.531828 G 0.629465 0.568073
10 0.634401 0.849432 G 0.629465 0.568073
11 0.724455 0.611024 D 0.453474 0.393238
12 0.722443 0.322959 G 0.629465 0.568073
13 0.361789 0.228263 B 0.478146 0.548364
Upvotes: 2