Saurabh
Saurabh

Reputation: 121

Compare a column with multiple columns and return the index where condition matched first in pandas

I want to compare a column (Age) of my dataframe to some of the columns (1 to 7) and put a condition, like if any of the value of col 1 to 7 is 75% or less of value of col Age then I want to return the index of the first column matched.

   Weight Name   Age     1     2        3      4        5      6        7    Output
  0   45  Sam     100 -75.0  -45.0    -92.0  -79.0   -57.0   -55.0   -35.0   true
  1   88  Andrea  25   30.0  -17.0     -4.0   18.0    20.0    40.0    NaN    true
  2   56  Alex    55  -47.0  -34.0    -12.0  -10.0    10.0    NaN     NaN    true
  3   15  Robin   8    13.0   35.0     37.0   57.0    NaN     NaN     NaN    false
  4   71  Kia     21   22.0   24.0     44.0    NaN    NaN     NaN     NaN    false
  5   44  Sia     43    2.0   22.0      NaN    NaN    NaN     NaN     NaN    true
  6   54  Ryan    45   20.0    NaN      NaN    NaN    NaN     NaN     NaN    true
  7   34  Dimi    65    NaN    NaN      NaN    NaN    NaN     NaN     NaN    false

I am able to generate column with true and false, if in a row, any value is 75% less of the value in the column 'Age' with following code:

 df['Output'] = ((df[['Age']].values)* 0.75  > df.iloc[:, 3:9].values).any(axis=1) 

However, I want to get the name of the column, which is giving the output as true. If the output is false, I want to get the max value of that row, for which I am using .idxmax(axis=1).

   Weight Name   Age     1     2        3      4        5      6        7   Expected Output
  0   45  Sam     100 -75.0  -45.0    -92.0  -79.0   -57.0   -55.0   -35.0   1
  1   88  Andrea  25   30.0  -17.0     -4.0   18.0    20.0    40.0    NaN    2
  2   56  Alex    55  -47.0  -34.0    -12.0  -10.0    10.0    NaN     NaN    1
  3   15  Robin   8    13.0   35.0     37.0   57.0    NaN     NaN     NaN    4
  4   71  Kia     21   22.0   24.0     44.0    NaN    NaN     NaN     NaN    3
  5   44  Sia     43    2.0   22.0      NaN    NaN    NaN     NaN     NaN    1
  6   54  Ryan    45   20.0    NaN      NaN    NaN    NaN     NaN     NaN    1
  7   34  Dimi    65    NaN    NaN      NaN    NaN    NaN     NaN     NaN    NaN

Upvotes: 2

Views: 343

Answers (2)

jezrael
jezrael

Reputation: 863166

Use DataFrame.lt with axis=0 for compare DataFrame with Series and then compare by DataFrame.any with DataFrame.idxmax and set new values by numpy.where:

df1 = df.iloc[:, 3:9]
mask = df1.lt(df['Age']* 0.75, axis=0)
df['Output'] = np.where(mask.any(axis=1), mask.idxmax(axis=1), df1.idxmax(axis=1))
print (df)
   Weight    Name  Age     1     2     3     4     5     6     7 Output
0      45     Sam  100 -75.0 -45.0 -92.0 -79.0 -57.0 -55.0 -35.0      1
1      88  Andrea   25  30.0 -17.0  -4.0  18.0  20.0  40.0   NaN      2
2      56    Alex   55 -47.0 -34.0 -12.0 -10.0  10.0   NaN   NaN      1
3      15   Robin    8  13.0  35.0  37.0  57.0   NaN   NaN   NaN      4
4      71     Kia   21  22.0  24.0  44.0   NaN   NaN   NaN   NaN      3
5      44     Sia   43   2.0  22.0   NaN   NaN   NaN   NaN   NaN      1
6      54    Ryan   45  20.0   NaN   NaN   NaN   NaN   NaN   NaN      1
7      34    Dimi   65   NaN   NaN   NaN   NaN   NaN   NaN   NaN    NaN

Upvotes: 1

Yevhenii Avlasenko
Yevhenii Avlasenko

Reputation: 31

Try to use apply() method for iterating through the rows:

def compare_age(row):
    age = row['age']
    columns = [str(num) for num in range(1, 8)]
    value = row[[columns]].max()
    for column in columns:
        if row[column] <= 0.75*age:
            value = float(column)
            break
    return value

df['Output'] = df.apply(compare_age, axis=1)

Upvotes: 1

Related Questions