coelidonum
coelidonum

Reputation: 543

Create a single categorical column based on conditions on many numerical columns (pandas)

I have a pandas dataframe like this

df:

sEXT | sNEU | sAGR | sCON | sOPN
2.4  | 3    | 2    | 2    | 5
3    | 1    | 4    | 2.7  | 1.5

I want to create a column "type" according the following rules. If sEXT > 2.5 add string "E" to status, else "I". If sNEU > 2.5 add string "N" to status, else "S". If sAGR > 2.5 add string "A" to status, else "H". If sCON > 2.5 add string "C" to status, else "S". If sOPN > 2.5 add string "O" to status, else "C".

My expected output is:

sEXT | sNEU | sAGR | sCON | sOPN | type
2.4  | 3    | 2    | 2    | 5    | "INHSO"
3    | 1    | 4    | 2.7  | 1.5  | "ESACC"

I was trying

df['type']=None
df['type'].loc[df['sEXT']>2.5]='E'
df['type'].loc[df['sEXT']<2.5]='I'

But I don't know how to go on. Can you help me?

Upvotes: 0

Views: 292

Answers (2)

Dmytro Bugayev
Dmytro Bugayev

Reputation: 686

Note: haven't tested the below code, but I suggest you use apply to achieve what you want, defining, putting your rules into logic_func similar to this:

def logic_func(row):
    status = ''
    if row['sEXT'] > 2.5: status += 'E'
    # other conditions here
    return status

df['type'] = df.apply(lambda row: logic_func, axis = 1)

Upvotes: 0

RJ Adriaansen
RJ Adriaansen

Reputation: 9639

You can write a function that creates the string, and then apply the dataframe to that function:

import pandas as pd

data = [ { "sEXT": 2.4, "sNEU": 3, "sAGR": 2, "sCON": 2, "sOPN": 5 }, { "sEXT": 3, "sNEU": 1, "sAGR": 4, "sCON": 2.7, "sOPN": 1.5 } ]
df = pd.DataFrame(data)

def generate_type(row):
    text = ''
    if row['sEXT'] > 2.5:
        text += 'E'
    else:
        text += 'I'
    if row['sNEU'] > 2.5:
        text += 'N'
    else:
        text += 'S'
    if row['sAGR'] > 2.5:
        text += 'A'
    else:
        text += 'H'
    if row['sCON'] > 2.5:
        text += 'C'
    else:
        text += 'S'
    if row['sOPN'] > 2.5:
        text += 'O'
    else:
        text += 'C'
    return text
        
df['type']= df.apply(generate_type, axis=1)

Result:

sEXT sNEU sAGR sCON sOPN type
0 2.4 3 2 2 5 INHSO
1 3 1 4 2.7 1.5 ESACC

Upvotes: 1

Related Questions