Create a single categorical column based on conditions on many numerical columns (pandas)

Question

I have a pandas dataframe like this

df:

sEXT | sNEU | sAGR | sCON | sOPN
2.4  | 3    | 2    | 2    | 5
3    | 1    | 4    | 2.7  | 1.5

I want to create a column "type" according the following rules. If sEXT > 2.5 add string "E" to status, else "I". If sNEU > 2.5 add string "N" to status, else "S". If sAGR > 2.5 add string "A" to status, else "H". If sCON > 2.5 add string "C" to status, else "S". If sOPN > 2.5 add string "O" to status, else "C".

My expected output is:

sEXT | sNEU | sAGR | sCON | sOPN | type
2.4  | 3    | 2    | 2    | 5    | "INHSO"
3    | 1    | 4    | 2.7  | 1.5  | "ESACC"

I was trying

df['type']=None
df['type'].loc[df['sEXT']>2.5]='E'
df['type'].loc[df['sEXT']<2.5]='I'

But I don't know how to go on. Can you help me?

RJ Adriaansen · Accepted Answer

You can write a function that creates the string, and then apply the dataframe to that function:

import pandas as pd

data = [ { "sEXT": 2.4, "sNEU": 3, "sAGR": 2, "sCON": 2, "sOPN": 5 }, { "sEXT": 3, "sNEU": 1, "sAGR": 4, "sCON": 2.7, "sOPN": 1.5 } ]
df = pd.DataFrame(data)

def generate_type(row):
    text = ''
    if row['sEXT'] > 2.5:
        text += 'E'
    else:
        text += 'I'
    if row['sNEU'] > 2.5:
        text += 'N'
    else:
        text += 'S'
    if row['sAGR'] > 2.5:
        text += 'A'
    else:
        text += 'H'
    if row['sCON'] > 2.5:
        text += 'C'
    else:
        text += 'S'
    if row['sOPN'] > 2.5:
        text += 'O'
    else:
        text += 'C'
    return text
        
df['type']= df.apply(generate_type, axis=1)

Result:

	sEXT	sNEU	sAGR	sCON	sOPN	type
0	2.4	3	2	2	5	INHSO
1	3	1	4	2.7	1.5	ESACC

Create a single categorical column based on conditions on many numerical columns (pandas)

Answers (2)

Related Questions