Reputation: 173
I have the following df, containing allocations for Stratification groups in a randomized controlled trial.
import numpy as np
import pandas as pd
df = pd.DataFrame([[1, "ABABBBAAAB"], [2, "BBABBBAAAA"], [3, "ABBAABABAB"]], columns=['StratID', 'Rand'])
df
StratID Rand
0 1 ABABBBAAAB
1 2 BBABBBAAAA
2 3 ABBAABABAB
I want to use np.where to trim the length of the Stratification groups based on the StratID, so for example StratID 1 and 3 should be trimmed to only retain the first 6 allocations
df["trimmed_col"] = np.where(df["StratID"].isin(
{1, 3}), df.Rand.str[:6], "")
df
StratID Rand trimmed_col
0 1 ABABBBAAAB ABABBB
1 2 BBABBBAAAA
2 3 ABBAABABAB ABBAAB
But when I go to do this for the last remaining StratID 3 I overwrite what I've done above and get the following.
df["trimmed_col"] = np.where(df["StratID"].isin(
{2}), df.Rand.str[:4], "")
df
StratID Rand trimmed_col
0 1 ABABBBAAAB
1 2 BBABBBAAAA BBAB
2 3 ABBAABABAB
How can I apply both changes to the dataframe at once so I get the following output
StratID Rand trimmed_col
0 1 ABABBBAAAB ABABBB
1 2 BBABBBAAAA BBAB
2 3 ABBAABABAB ABBAAB
Upvotes: 1
Views: 52
Reputation: 5918
If I had to apply same conditions, I will go with np.select.
Code
conditions = [
df["StratID"].isin({1, 3}),
df["StratID"].isin({2}),
]
choices = [df.Rand.str[:6], df.Rand.str[:4]]
df['trimmed_col'] = np.select(conditions, choices)
Input
StratID Rand
0 1 ABABBBAAAB
1 2 BBABBBAAAA
2 3 ABBAABABAB
Output
StratID Rand trimmed_col
0 1 ABABBBAAAB ABABBB
1 2 BBABBBAAAA BBAB
2 3 ABBAABABAB ABBAAB
Explanation
If we need to apply single if else condition then it is good to go with np.where or df.where.
But, if we have multiple if-elif-else conditions then we can use np.select.
Here,
conditions list - consists of all the conditions we want to apply.
choices list - consists of the choices/logic for the output.
Upvotes: 2