Jusep
Jusep

Reputation: 187

Conditionally give value to one column based on the words that another column contains

I have a dataframe "df":

Patient    Condition
John       Adductor tear left
Mary       Adductor sprain left
Henry      Hamstring sprain
Lucy       Hamstring tear

If Adductor = 1 and Hamstring = 2, and on the other hand, tear = 1 and sprain = 2. I would like to create two columns, one for the Muscle, and another one for the injury, showing that value. The expected result would be:

Patient    Condition             Muscle   Injury
John       Adductor tear left      1        1
Mary       Adductor sprain left    1        2
Henry      Hamstring sprain        2        2
Lucy       Hamstring tear          2        1

I'm not sure if i should split the "condition" column and then replace the words. Or use a condition, but I'm not sure how to use the condition based on a word. Any idea how could I do it?

Upvotes: 0

Views: 69

Answers (3)

rhug123
rhug123

Reputation: 8768

Here is another way using enumerate() and assign()

m = {j:i for i,j in enumerate(['adductor','hamstring'],1)}
i = {j:i for i,j in enumerate(['tear','sprain'],1)}

col = df['Condition'].str.split()
df.assign(Muscle = col.str[0].str.lower().map(m), Injury = col.str[1].str.lower().map(i))

Upvotes: 1

user7864386
user7864386

Reputation:

You could also use str.contains to search for specific strings in Condition column and assign values using np.select.

import numpy as np
df['Muscle'] = np.select([df['Condition'].str.contains('Adductor'), df['Condition'].str.contains('Hamstring')], [1,2], np.nan)
df['Injury'] = np.select([df['Condition'].str.contains('tear'), df['Condition'].str.contains('sprain')], [1,2], np.nan)

Upvotes: 1

Ahmed Elashry
Ahmed Elashry

Reputation: 399

You do not need to split the condition column. You can use the following code given the assumptions:

  • every row in df contains either "Adductor" or "Hamstring" in the Condition column, case sensitive.
  • same thing with the words "tear" and "sprain".
df["Muscle"] = df["Condition"].apply(lambda x: 1 if "Adductor" in x else 2)

Same thing with the Injury column. You can try it and let me know if you need help.

If you do not want to worry about words being in upper or lower cases, you can use:

df["Muscle"] = df["Condition"].apply(lambda x: 1 if "adductor" in x.lower() else 2)

Update to address more than two muscle/injuries groups:

I will give example for handling muscles, and the same goes for injuries. You can create a dictionary where the keys are the muscle names and the values are the codes you want to put in the new columns. You can write the dictionary in lower case so that you do not need to worry about letter cases in the code.

muscle_dict = {
    "muscle_1": 1,
    "muscle_2": 2,
    "muscle_3": 3
}

Now, you can do the same trick with apply() function but with slightly more complex function.

df["Muscle"] = df["Condition"].apply(lambda x: muscle_dict[x.split()[0].lower()])

You can make the code more readable by creating a stand alone function that takes a string that you write in the Condition column and return the muscle code.

def get_muscle_code(cond):
    # it is better to define the muscle dictionary here
    muscle = cond.split()[0].lower()
    return muscle_dict[muscle]

# above code will be 
df["Muscle"] = df["Condition"].apply(get_muscle_code)

Upvotes: 1

Related Questions