Monirrad
Monirrad

Reputation: 483

How to get mean of the values of one column based on the similarity of the corresponds values in another columns

I would be thankful if someone tell me how to do the task bellow. Suppose that I have a dataframe in python as follows:

  col1 col2 col3 col4
0    A 2001    2    5
1    A 2001    2    4
2    A 2001    3    6
3    A 2002    4    5
4    B 2001    2    9
5    B 2001    2    4
6    B 2001    2    3
7    B 2001    3   95

I want to get the mean of the col4 if the corresponds values in col1, col2, and col3 are the same and then get rid of the rows with the repeated values in the first 3 columns. For example, the values of the col1, col2, col3 i the two first column are same, so, we want to eliminate one of them and update the value of col4 as the mean of 5 and 4. Te result should be:

  col1 col2 col3 col4
0    A 2001    2 4.55
1    A 2001    3    6
2    A 2002    4    5
3    B 2001    2 5.33
4    B 2001    3   95

Upvotes: 1

Views: 68

Answers (1)

U13-Forward
U13-Forward

Reputation: 71600

Use groupby to group 'col1' and 'col2' and 'col3', then get mean of the 'col4' column:

print(df.groupby(['col1','col2','col3'],as_index=False)['col4'].mean())

Output:

  col1  col2  col3       col4
0    A  2001     2   4.500000
1    A  2001     3   6.000000
2    A  2002     4   5.000000
3    B  2001     2   5.333333
4    B  2001     3  95.000000

Upvotes: 1

Related Questions