Karma
Karma

Reputation: 269

Compare values in different columns

I have a df like this:

Name   Salary   Age   Cars   Avg Salary   Avg Age   Avg Cars
John    50000    35      1        60000        38          1
 Tom    65000    45      3        60000        38          1

Some columns are better when the values are higher, whereas other columns are vice versa. So I created two lists

higher_better = ['Salary', 'Cars']
lower_better = ['Age']

I want to compare them and return a score for them, so I define a new function like this:

def compare_higher(a, b):
    return 1 if a > b else 0 if a == b else -1
compare_higher(higher_better, lower_better)

def compare_lower(b, a):
    return 1 if a > b else 0 if a == b else -1
compare_lower(higher_better, lower_better)

I want to create new columns for the comparison results, then I can add their score together. Ideal output as below:

Name   Salary   Age   Cars   Avg Salary   Avg Age   Avg Cars   Comp Salary   Comp Age   Comp Cars   Score
John    50000    35      1        60000        38          1        -1             1          0       0
 Tom    65000    45      3        60000        38          1         1            -1          1       1

I have the pieces but don't know how to combine them. How do you compare the values in the columns and return them in new columns? Thank you for your help.

Upvotes: 1

Views: 56

Answers (2)

ALollz
ALollz

Reputation: 59519

You can also avoid apply with np.select, which is useful for defining values for multiple conditions.

import numpy as np
import pandas as pd

df['Comp Salary'] = np.select([df.Salary < df['Avg Salary'], df.Salary == df['Avg Salary'], 
    df.Salary > df['Avg Salary']], [-1,0,1])
df['Comp Cars'] = np.select([df.Cars < df['Avg Cars'], df.Cars == df['Avg Cars'], 
    df.Cars > df['Avg Cars']], [-1,0,1])
df['Comp Age'] = np.select([df.Age < df['Avg Age'], df.Age == df['Avg Age'], 
    df.Age > df['Avg Age']], [1,0,-1])

df['Score'] = df[['Comp Salary', 'Comp Cars', 'Comp Age']].sum(axis=1)

Upvotes: 1

niraj
niraj

Reputation: 18208

I think you can use multiple apply for this: First you can apply to higher_better:

for col in higher_better:
    df['Comp ' + col] = df.apply(lambda row: compare_higher(row[col], row['Avg ' + col]), axis=1)

Then, to lower_better:

for col in lower_better:
    df['Comp ' + col] = df.apply(lambda row: compare_lower(row[col], row['Avg ' + col]), axis=1)

Finally, combine columns for both to score:

comp_col = ['Comp '+ col for col in higher_better+lower_better]
df['score'] = df[comp_col].sum(axis=1)

Result:

   Name  Salary  Age  Cars  Avg Salary  Avg Age  Avg Cars  Comp Salary  \
0  John   50000   35     1       60000       38         1           -1   
1   Tom   65000   45     3       60000       38         1            1   

   Comp Cars  Comp Age  score  
0          0         1      0  
1          1        -1      1  

Upvotes: 1

Related Questions