Reputation: 4705
Given the following dataframe:
x = pd.DataFrame(
{"a": [1, 2, 3, 2], "b_1": [0, 0, 0, 0], "b_2": [0, 0, 0, 0], "b_3": [0, 0, 0, 0]}
)
Which looks as:
a b_1 b_2 b_3
0 1 0 0 0
1 2 0 0 0
2 3 0 0 0
3 2 0 0 0
How can it be coverted to:
y = pd.DataFrame(
{
"a": [1, 2, 3, 2],
"b_1": [-1, 0, 0, 0],
"b_2": [0, -1, 0, -1],
"b_3": [0, 0, -1, 0],
}
)
which looks as:
a b_1 b_2 b_3
0 1 -1 0 0
1 2 0 -1 0
2 3 0 0 -1
3 2 0 -1 0
Here's a solution:
x1 = x.melt(id_vars="a", ignore_index=False)
x1["value_2"] = x1["variable"].str.split("_").str[1].astype(int)
x1.loc[x1["a"].eq(x1["value_2"]), "value"] = -1
x1 = x1.drop("value_2", axis=1)
x1.set_index(["a", "variable"], append=True)["value"].unstack().reset_index(level=1)
x1 = x1.set_index(["a", "variable"], append=True)["value"].unstack().reset_index(level=1)
I feel as though it's quite messy though.
Upvotes: 2
Views: 55
Reputation: 29635
you can use pd.get_dummies
.
print(pd.get_dummies(x['a']).add_prefix('b_'))
b_1 b_2 b_3
0 1 0 0
1 0 1 0
2 0 0 1
3 0 1 0
Then you have different options to substract it from x
. For example, you can use this way with reindex
.
y = x - pd.get_dummies(x['a']).add_prefix('b_').reindex(columns=x.columns, fill_value=0)
print(y)
a b_1 b_2 b_3
0 1 -1 0 0
1 2 0 -1 0
2 3 0 0 -1
3 2 0 -1 0
Note that if you don't have the column b_*
already in x
and want to generate them automatically from the column a, then something like this would work too.
x = pd.DataFrame({"a": [1, 2, 3, 2]})
y = x.sub(pd.get_dummies(x['a']).add_prefix('b_'), fill_value=0)
print(y)
Upvotes: 5