Reputation: 458

Using a lambda expression for multiple columns in python

I have this data:

I want to make a new column where if we have an 'X' in delinquency or Suspect LTV has a 'LTV < 10%' Then I will have a 'Yes' in an entry in a new column.

Example of what the output should look like using this data:

Now if there is no X and no LTV < 10% then I will just have a 'No'. I am going to be applying this across many columns but I just focused on two to not make it more complicated.

This is what I have tried:

def exceptions(column):
    for i in column:
        if i == 'X':
            return 'Yes'
        elif i == 'LTV < 10%':
            return 'Yes'
        else:
            return 'No'


df1['Exceptions'] = df1.apply(lambda x : exceptions(x['Delinquency 2+ Month Change']), axis = 1)
df1['Exceptions'] = df1.apply(lambda x : exceptions(x['Suspect LTV']), axis = 1)

Upvotes: 0

Answers (3)

Karn Kumar

Reputation: 8816

As @Erfan mention in the comment section, you don't really need a for loop or lambda to achieve this, You need np.where.

One way with np.where
Or you can use np.where with np.logical_or

DataFrame:

>>> df
  Delinquency 2+ Month Change Suspect LTV
0                           X
1                           X
2                           X
3                           X
4                           X
5                           X   LTV < 10%

Result:

1- Use OR operator as pipe | with np.where

>>> df['Exceptions'] = np.where((df['Delinquency 2+ Month Change'] == 'X') |  (df['Suspect LTV'] == 'LTV < 10%'), 'Yes', 'No')
>>> df
  Delinquency 2+ Month Change Suspect LTV Exceptions
0                           X                    Yes
1                           X                    Yes
2                           X                    Yes
3                           X                    Yes
4                           X                    Yes
5                           X   LTV < 10%        Yes

2- Second method as i mention at the start with np.where with np.logical_or

>>> df['Exceptions'] = np.where(np.logical_or(df['Delinquency 2+ Month Change'] == 'X', df['Suspect LTV'] == 'LTV < 10%'), 'yes', 'no')
>>> df
  Delinquency 2+ Month Change Suspect LTV Exceptions
0                           X                    yes
1                           X                    yes
2                           X                    yes
3                           X                    yes
4                           X                    yes
5                           X   LTV < 10%        yes

Same but alternative with .eq rather ==

df['Exceptions'] = np.where(np.logical_or(df['Delinquency 2+ Month Change'].eq('X') ,df['Suspect LTV'].eq('LTV < 10%')), 'Yes', 'No')

df['Exceptions'] = np.where(np.logical_or(df['Delinquency 2+ Month Change'].eq('X') ,df['Suspect LTV'].eq('LTV < 10%')), 'Yes', 'No')

In case, you are interested about the way with using defining a function and applying that across ..

Function definition...

def exceptions(row):
  if row['Delinquency 2+ Month Change'] == 'X':
    return 'Yes'
  if row['Suspect LTV'] == 'LTV < 10%':
    return 'Yes'
  else:
    return 'No'

Use the Function with DataFrame.apply() as follows..

df['Exceptions'] = df.apply(exceptions, axis=1)
print(df)
  Delinquency 2+ Month Change Suspect LTV Exceptions
0                           X                    Yes
1                           X                    Yes
2                           X                    Yes
3                           X                    Yes
4                           X                    Yes
5                           X   LTV < 10%        Yes

Upvotes: 0

SpghttCd

Reputation: 10860

You shouldn't iterate over the rows of a dataframe for tasks like this (and apply is just another way of iterating).

You can do simple boolean arithmetic and apply the result to where:

mcve:

import pandas as pd
df = pd.DataFrame({'A': list('xx x '), 'B': list('y y  ')})

df['check'] = 'yes'
df.check.where((df.A=='x') | (df.B=='y'), 'no', inplace=True)

#    A  B check
# 0  x  y   yes                                              
# 1  x      yes                                                   
# 2     y   yes                                              
# 3  x      yes                                              
# 4          no

Upvotes: 0

Dan

Reputation: 10786

The reason that isn't working is because your "exceptions" function needs /both/ columns in order to make a decision, and you're only passing it one of those columns. Instead, you can pass the entire row to your lambda function, and index the columns you need within that function.

def exceptions(row):
    if row['Delinquency 2+ Month Change'] == 'X':
        return 'Yes'
    if row['Suspect LTV'] == 'LTV < 10%':
        return 'Yes'
    return 'No'


df1['Exceptions'] = df1.apply(exceptions, axis = 1)

I changed your exceptions function to accept one entire row instead of a single cell. There was no need to have a for loop inside it. I also changed how you call the apply function. This new code is essentially the same as df1.apply(lambda x : exceptions(x), axis = 1), but you don't need to use lambda if you're passing your data directly to a function and using the result.

Upvotes: 1

Using a lambda expression for multiple columns in python

Answers (3)

DataFrame:

Result:

Related Questions