Reputation: 269
I have a df like this:
Name Salary Age Cars Avg Salary Avg Age Avg Cars
John 50000 35 1 60000 38 1
Tom 65000 45 3 60000 38 1
Some columns are better when the values are higher, whereas other columns are vice versa. So I created two lists
higher_better = ['Salary', 'Cars']
lower_better = ['Age']
I want to compare them and return a score for them, so I define a new function like this:
def compare_higher(a, b):
return 1 if a > b else 0 if a == b else -1
compare_higher(higher_better, lower_better)
def compare_lower(b, a):
return 1 if a > b else 0 if a == b else -1
compare_lower(higher_better, lower_better)
I want to create new columns for the comparison results, then I can add their score together. Ideal output as below:
Name Salary Age Cars Avg Salary Avg Age Avg Cars Comp Salary Comp Age Comp Cars Score
John 50000 35 1 60000 38 1 -1 1 0 0
Tom 65000 45 3 60000 38 1 1 -1 1 1
I have the pieces but don't know how to combine them. How do you compare the values in the columns and return them in new columns? Thank you for your help.
Upvotes: 1
Views: 56
Reputation: 59519
You can also avoid apply with np.select
, which is useful for defining values for multiple conditions.
import numpy as np
import pandas as pd
df['Comp Salary'] = np.select([df.Salary < df['Avg Salary'], df.Salary == df['Avg Salary'],
df.Salary > df['Avg Salary']], [-1,0,1])
df['Comp Cars'] = np.select([df.Cars < df['Avg Cars'], df.Cars == df['Avg Cars'],
df.Cars > df['Avg Cars']], [-1,0,1])
df['Comp Age'] = np.select([df.Age < df['Avg Age'], df.Age == df['Avg Age'],
df.Age > df['Avg Age']], [1,0,-1])
df['Score'] = df[['Comp Salary', 'Comp Cars', 'Comp Age']].sum(axis=1)
Upvotes: 1
Reputation: 18208
I think you can use multiple apply
for this:
First you can apply to higher_better
:
for col in higher_better:
df['Comp ' + col] = df.apply(lambda row: compare_higher(row[col], row['Avg ' + col]), axis=1)
Then, to lower_better
:
for col in lower_better:
df['Comp ' + col] = df.apply(lambda row: compare_lower(row[col], row['Avg ' + col]), axis=1)
Finally, combine columns for both to score
:
comp_col = ['Comp '+ col for col in higher_better+lower_better]
df['score'] = df[comp_col].sum(axis=1)
Result:
Name Salary Age Cars Avg Salary Avg Age Avg Cars Comp Salary \
0 John 50000 35 1 60000 38 1 -1
1 Tom 65000 45 3 60000 38 1 1
Comp Cars Comp Age score
0 0 1 0
1 1 -1 1
Upvotes: 1