Reputation: 483
I would be thankful if someone tell me how to do the task bellow. Suppose that I have a dataframe in python as follows:
col1 col2 col3 col4
0 A 2001 2 5
1 A 2001 2 4
2 A 2001 3 6
3 A 2002 4 5
4 B 2001 2 9
5 B 2001 2 4
6 B 2001 2 3
7 B 2001 3 95
I want to get the mean of the col4 if the corresponds values in col1, col2, and col3 are the same and then get rid of the rows with the repeated values in the first 3 columns. For example, the values of the col1, col2, col3 i the two first column are same, so, we want to eliminate one of them and update the value of col4 as the mean of 5 and 4. Te result should be:
col1 col2 col3 col4
0 A 2001 2 4.55
1 A 2001 3 6
2 A 2002 4 5
3 B 2001 2 5.33
4 B 2001 3 95
Upvotes: 1
Views: 68
Reputation: 71600
Use groupby
to group 'col1'
and 'col2'
and 'col3'
, then get mean of the 'col4'
column:
print(df.groupby(['col1','col2','col3'],as_index=False)['col4'].mean())
Output:
col1 col2 col3 col4
0 A 2001 2 4.500000
1 A 2001 3 6.000000
2 A 2002 4 5.000000
3 B 2001 2 5.333333
4 B 2001 3 95.000000
Upvotes: 1