Reputation: 879
I have the following pandas DataFrame
:
df = pd.DataFrame({
"category": ["one", "one", "one", "one", "two", "two", "two", "three", "three", "three"],
"value": [2, 4, 3, 2, 5, 6, 5, 7, 8, 6]
})
>>> df
category value
0 one 2
1 one 4
2 one 3
3 one 2
4 two 5
5 two 6
6 two 5
7 three 7
8 three 8
9 three 6
I want to calculate a new column called normalized
by computing the median (or any other groupby operation) and subtracting it (or any other simple operation) from the corresponding values in the non-grouped DataFrame
. In non-pandas code this is what I mean:
new_column = []
# Groupby equivalent
for cat in df["category"].unique():
curr_df = df[df["category"] == cat]
curr_median = curr_df.median()
# Calculation on groupby components
for val in curr_df["value"]:
normalized = val - curr_median
new_column.append(normalized)
df["normalized"] = new_column
Which results in the following DataFrame
:
df = pd.DataFrame({
"category": ["one", "one", "one", "one", "two", "two", "two", "three", "three", "three"],
"value": [2, 4, 3, 2, 5, 6, 5, 7, 8, 6],
"normalized": [-0.5, 1.5, 0.5, -0.5, 0.0, 1.0, 0.0, 0.0, 1.0, -1.0]
})
>>> df
category value normalized
0 one 2 -0.5
1 one 4 1.5
2 one 3 0.5
3 one 2 -0.5
4 two 5 0.0
5 two 6 1.0
6 two 5 0.0
7 three 7 0.0
8 three 8 1.0
9 three 6 -1.0
How could I write this in a nicer, pandas way? Thanks in advance :)
Upvotes: 0
Views: 66
Reputation: 11161
transform
is your friend. I think of this as apply
when I want to maintain the original dataframe shape. You can use this:
df["normalized"] = df.value - df.groupby("category").value.transform("median")
output:
category value normalized
0 one 2 -0.5
1 one 4 1.5
2 one 3 0.5
3 one 2 -0.5
4 two 5 0.0
5 two 6 1.0
6 two 5 0.0
7 three 7 0.0
8 three 8 1.0
9 three 6 -1.0
Upvotes: 2