Reputation: 727
I have a dataframe:
group id
A 009x
A 010x
B 009x
B 002x
C 002x
C 003x
How do I make a new column new
that categorizes conditionally under the following three conditions by group
:
id
values consist of ONLY 009x
and 010x
, then categorize as g1
id
value is one of 009x
or 010x
AND another id
value is not one of 009x
or 010x
, then categorize as g2
id
valueDesired result:
group id new
A 009x g1
A 010x g1
B 009x g2
B 002x g2
C 002x 002x
C 003x 003x
data = {
'group': ['A', 'A', 'B', 'B', 'C', 'C'],
'id': ['009x', '010x', '009x', '002x', '002x', '003x'],
}
df = pd.DataFrame(data)
df
Upvotes: 1
Views: 53
Reputation: 195553
I hope I've understood your question right. You can use .groupby()
+ custom function:
def categorize_fn(x):
tmp = x["id"].isin(["009x", "010x"])
if tmp.all():
x["new"] = "g1"
elif tmp.any():
x["new"] = "g2"
else:
x["new"] = x["id"]
return x
df = df.groupby("group", group_keys=False).apply(categorize_fn)
print(df)
Prints:
group id new
0 A 009x g1
1 A 010x g1
2 B 009x g2
3 B 002x g2
4 C 002x 002x
5 C 003x 003x
Upvotes: 1