Average values of one column based on the values of another

Question

I have a dataframe with a column that has many repeated values in one column. I want to create another dataframe that has only one instance of that value paired with the average of it's matching values from another column of the original dataframe. Example:

data = [[1,1],[1,2],[2,2],[3,3],[3,1],[2,3],[1,5],[2,7],[3,9]]

df = pd.DataFrame(data, columns = ['A', 'B'])
   A  B
0  1  1
1  1  2
2  2  2
3  3  3
4  3  1
5  2  3
6  1  5
7  2  7
8  3  9

I want to average column B values that match the same column A values. then have a new dataframe that look like this:

I did this by looping:

df2 = pdDataFrame(columns = ['A','B'])
uni = df.A.unique()
for x in uni:
    av = df.loc[(df['A'] == x, 'B')].mean()
    df2 = df2.append(pd.DataFrame([[x, av]], columns = ['A', 'B']))

I know it is not good form to loop through an DataFrame and this process took forever. It also resulted in a DataFrame without an index (they were all 0). What is a more appropriate and efficient method to do this. Thank you in advance!

igorkf · Accepted Answer

Simply:

df.groupby('A')[['B']].mean().reset_index()
    A      B
0   1   2.666667
1   2   4.000000
2   3   4.333333

If you want B rounded:

df.groupby('A')[['B']].mean().round(1).reset_index()
    A    B
0   1   2.7
1   2   4.0
2   3   4.3

Or even this way, not using A as index, with argument as_index=False:

df.groupby('A', as_index=False)[['B']].mean().round(1)
    A    B
0   1   2.7
1   2   4.0
2   3   4.3

Average values of one column based on the values of another

Answers (2)

Related Questions