Reputation: 187
I have a dataframe "df":
Patient Condition
John Adductor tear left
Mary Adductor sprain left
Henry Hamstring sprain
Lucy Hamstring tear
If Adductor = 1 and Hamstring = 2, and on the other hand, tear = 1 and sprain = 2. I would like to create two columns, one for the Muscle, and another one for the injury, showing that value. The expected result would be:
Patient Condition Muscle Injury
John Adductor tear left 1 1
Mary Adductor sprain left 1 2
Henry Hamstring sprain 2 2
Lucy Hamstring tear 2 1
I'm not sure if i should split the "condition" column and then replace the words. Or use a condition, but I'm not sure how to use the condition based on a word. Any idea how could I do it?
Upvotes: 0
Views: 69
Reputation: 8768
Here is another way using enumerate()
and assign()
m = {j:i for i,j in enumerate(['adductor','hamstring'],1)}
i = {j:i for i,j in enumerate(['tear','sprain'],1)}
col = df['Condition'].str.split()
df.assign(Muscle = col.str[0].str.lower().map(m), Injury = col.str[1].str.lower().map(i))
Upvotes: 1
Reputation:
You could also use str.contains
to search for specific strings in Condition column and assign values using np.select
.
import numpy as np
df['Muscle'] = np.select([df['Condition'].str.contains('Adductor'), df['Condition'].str.contains('Hamstring')], [1,2], np.nan)
df['Injury'] = np.select([df['Condition'].str.contains('tear'), df['Condition'].str.contains('sprain')], [1,2], np.nan)
Upvotes: 1
Reputation: 399
You do not need to split the condition column. You can use the following code given the assumptions:
df
contains either "Adductor"
or "Hamstring"
in the Condition
column, case sensitive."tear"
and "sprain"
.df["Muscle"] = df["Condition"].apply(lambda x: 1 if "Adductor" in x else 2)
Same thing with the Injury
column. You can try it and let me know if you need help.
If you do not want to worry about words being in upper or lower cases, you can use:
df["Muscle"] = df["Condition"].apply(lambda x: 1 if "adductor" in x.lower() else 2)
Update to address more than two muscle/injuries groups:
I will give example for handling muscles, and the same goes for injuries. You can create a dictionary where the keys are the muscle names and the values are the codes you want to put in the new columns. You can write the dictionary in lower case so that you do not need to worry about letter cases in the code.
muscle_dict = {
"muscle_1": 1,
"muscle_2": 2,
"muscle_3": 3
}
Now, you can do the same trick with apply()
function but with slightly more complex function.
df["Muscle"] = df["Condition"].apply(lambda x: muscle_dict[x.split()[0].lower()])
You can make the code more readable by creating a stand alone function that takes a string that you write in the Condition
column and return the muscle code.
def get_muscle_code(cond):
# it is better to define the muscle dictionary here
muscle = cond.split()[0].lower()
return muscle_dict[muscle]
# above code will be
df["Muscle"] = df["Condition"].apply(get_muscle_code)
Upvotes: 1