Reputation: 121
I want to compare a column (Age) of my dataframe to some of the columns (1 to 7) and put a condition, like if any of the value of col 1 to 7 is 75% or less of value of col Age then I want to return the index of the first column matched.
Weight Name Age 1 2 3 4 5 6 7 Output
0 45 Sam 100 -75.0 -45.0 -92.0 -79.0 -57.0 -55.0 -35.0 true
1 88 Andrea 25 30.0 -17.0 -4.0 18.0 20.0 40.0 NaN true
2 56 Alex 55 -47.0 -34.0 -12.0 -10.0 10.0 NaN NaN true
3 15 Robin 8 13.0 35.0 37.0 57.0 NaN NaN NaN false
4 71 Kia 21 22.0 24.0 44.0 NaN NaN NaN NaN false
5 44 Sia 43 2.0 22.0 NaN NaN NaN NaN NaN true
6 54 Ryan 45 20.0 NaN NaN NaN NaN NaN NaN true
7 34 Dimi 65 NaN NaN NaN NaN NaN NaN NaN false
I am able to generate column with true and false, if in a row, any value is 75% less of the value in the column 'Age' with following code:
df['Output'] = ((df[['Age']].values)* 0.75 > df.iloc[:, 3:9].values).any(axis=1)
However, I want to get the name of the column, which is giving the output as true. If the output is false, I want to get the max value of that row, for which I am using .idxmax(axis=1).
Weight Name Age 1 2 3 4 5 6 7 Expected Output
0 45 Sam 100 -75.0 -45.0 -92.0 -79.0 -57.0 -55.0 -35.0 1
1 88 Andrea 25 30.0 -17.0 -4.0 18.0 20.0 40.0 NaN 2
2 56 Alex 55 -47.0 -34.0 -12.0 -10.0 10.0 NaN NaN 1
3 15 Robin 8 13.0 35.0 37.0 57.0 NaN NaN NaN 4
4 71 Kia 21 22.0 24.0 44.0 NaN NaN NaN NaN 3
5 44 Sia 43 2.0 22.0 NaN NaN NaN NaN NaN 1
6 54 Ryan 45 20.0 NaN NaN NaN NaN NaN NaN 1
7 34 Dimi 65 NaN NaN NaN NaN NaN NaN NaN NaN
Upvotes: 2
Views: 343
Reputation: 863166
Use DataFrame.lt
with axis=0
for compare DataFrame
with Series
and then compare by DataFrame.any
with DataFrame.idxmax
and set new values by numpy.where
:
df1 = df.iloc[:, 3:9]
mask = df1.lt(df['Age']* 0.75, axis=0)
df['Output'] = np.where(mask.any(axis=1), mask.idxmax(axis=1), df1.idxmax(axis=1))
print (df)
Weight Name Age 1 2 3 4 5 6 7 Output
0 45 Sam 100 -75.0 -45.0 -92.0 -79.0 -57.0 -55.0 -35.0 1
1 88 Andrea 25 30.0 -17.0 -4.0 18.0 20.0 40.0 NaN 2
2 56 Alex 55 -47.0 -34.0 -12.0 -10.0 10.0 NaN NaN 1
3 15 Robin 8 13.0 35.0 37.0 57.0 NaN NaN NaN 4
4 71 Kia 21 22.0 24.0 44.0 NaN NaN NaN NaN 3
5 44 Sia 43 2.0 22.0 NaN NaN NaN NaN NaN 1
6 54 Ryan 45 20.0 NaN NaN NaN NaN NaN NaN 1
7 34 Dimi 65 NaN NaN NaN NaN NaN NaN NaN NaN
Upvotes: 1
Reputation: 31
Try to use apply() method for iterating through the rows:
def compare_age(row):
age = row['age']
columns = [str(num) for num in range(1, 8)]
value = row[[columns]].max()
for column in columns:
if row[column] <= 0.75*age:
value = float(column)
break
return value
df['Output'] = df.apply(compare_age, axis=1)
Upvotes: 1