Reputation: 17
I have a dataframe with a column that has many repeated values in one column. I want to create another dataframe that has only one instance of that value paired with the average of it's matching values from another column of the original dataframe. Example:
data = [[1,1],[1,2],[2,2],[3,3],[3,1],[2,3],[1,5],[2,7],[3,9]]
df = pd.DataFrame(data, columns = ['A', 'B'])
A B
0 1 1
1 1 2
2 2 2
3 3 3
4 3 1
5 2 3
6 1 5
7 2 7
8 3 9
I want to average column B values that match the same column A values. then have a new dataframe that look like this:
A B
0 1 2.7
1 2 4
2 3 4.3
I did this by looping:
df2 = pdDataFrame(columns = ['A','B'])
uni = df.A.unique()
for x in uni:
av = df.loc[(df['A'] == x, 'B')].mean()
df2 = df2.append(pd.DataFrame([[x, av]], columns = ['A', 'B']))
I know it is not good form to loop through an DataFrame and this process took forever. It also resulted in a DataFrame without an index (they were all 0). What is a more appropriate and efficient method to do this. Thank you in advance!
Upvotes: 1
Views: 37
Reputation: 3565
Simply:
df.groupby('A')[['B']].mean().reset_index()
A B
0 1 2.666667
1 2 4.000000
2 3 4.333333
If you want B
rounded:
df.groupby('A')[['B']].mean().round(1).reset_index()
A B
0 1 2.7
1 2 4.0
2 3 4.3
Or even this way, not using A
as index, with argument as_index=False
:
df.groupby('A', as_index=False)[['B']].mean().round(1)
A B
0 1 2.7
1 2 4.0
2 3 4.3
Upvotes: 1
Reputation: 14191
You may group your table by the column "A"
and compute the mean for every such a group:
df.groupby("A").mean()
The result:
B A 1 2.666667 2 4.000000 3 4.333333
Upvotes: 0