Create a Column in a Dataframe Conditional on Other Columns

Question

I am attempting to create a new column that takes the mean of values in another column conditional on a value in another column.

pd.DataFrame({"A":[1, 2, 1, 2],
      "B":[4, 6, 8, 12]

I want to create a new column 'C' that would be

pd.DataFrame({"A":[1, 2, 1, 2, 3],
      "B":[4, 6, 8, 12, 4],
      "C":[6,9,6,9,4]}

If it is not clear, I want to output the mean of the values in column B when the values in column A are the same. So, C = (4 + 6 + ...) / n where A == 1 and C = (6 + 8 + ... ) / n where A == 2, etc...

I am having trouble thinking out the pseudo code for this as well. Any logical explanation in addition to a code solution would be appreciated.

Andy Hayden · Accepted Answer

That's a transform:

In [11]: df
Out[11]:
   A   B
0  1   4
1  2   6
2  1   8
3  2  12
4  3   4

In [12]: df.groupby("A")["B"].transform('mean')
Out[12]:
0    6
1    9
2    6
3    9
4    4
Name: B, dtype: int64

In [13]: df["C"] = df.groupby("A")["B"].transform('mean')

See also in the group by docs.

Create a Column in a Dataframe Conditional on Other Columns

Answers (1)

Related Questions