Reputation: 47
I am attempting to create a new column that takes the mean of values in another column conditional on a value in another column.
pd.DataFrame({"A":[1, 2, 1, 2],
"B":[4, 6, 8, 12]
I want to create a new column 'C' that would be
pd.DataFrame({"A":[1, 2, 1, 2, 3],
"B":[4, 6, 8, 12, 4],
"C":[6,9,6,9,4]}
If it is not clear, I want to output the mean of the values in column B when the values in column A are the same. So, C = (4 + 6 + ...) / n where A == 1 and C = (6 + 8 + ... ) / n where A == 2, etc...
I am having trouble thinking out the pseudo code for this as well. Any logical explanation in addition to a code solution would be appreciated.
Upvotes: 1
Views: 53
Reputation: 375485
That's a transform
:
In [11]: df
Out[11]:
A B
0 1 4
1 2 6
2 1 8
3 2 12
4 3 4
In [12]: df.groupby("A")["B"].transform('mean')
Out[12]:
0 6
1 9
2 6
3 9
4 4
Name: B, dtype: int64
In [13]: df["C"] = df.groupby("A")["B"].transform('mean')
See also in the group by docs.
Upvotes: 1