Reputation: 458
I have this data:
I want to make a new column where if we have an 'X' in delinquency or Suspect LTV has a 'LTV < 10%' Then I will have a 'Yes' in an entry in a new column.
Example of what the output should look like using this data:
Now if there is no X and no LTV < 10% then I will just have a 'No'. I am going to be applying this across many columns but I just focused on two to not make it more complicated.
This is what I have tried:
def exceptions(column):
for i in column:
if i == 'X':
return 'Yes'
elif i == 'LTV < 10%':
return 'Yes'
else:
return 'No'
df1['Exceptions'] = df1.apply(lambda x : exceptions(x['Delinquency 2+ Month Change']), axis = 1)
df1['Exceptions'] = df1.apply(lambda x : exceptions(x['Suspect LTV']), axis = 1)
Upvotes: 0
Views: 1544
Reputation: 8816
As @Erfan mention in the comment section, you don't really need a for loop or lambda
to achieve this, You need np.where
.
np.where
np.where
with np.logical_or
>>> df
Delinquency 2+ Month Change Suspect LTV
0 X
1 X
2 X
3 X
4 X
5 X LTV < 10%
1- Use OR operator as pipe |
with np.where
>>> df['Exceptions'] = np.where((df['Delinquency 2+ Month Change'] == 'X') | (df['Suspect LTV'] == 'LTV < 10%'), 'Yes', 'No')
>>> df
Delinquency 2+ Month Change Suspect LTV Exceptions
0 X Yes
1 X Yes
2 X Yes
3 X Yes
4 X Yes
5 X LTV < 10% Yes
2- Second method as i mention at the start with np.where
with np.logical_or
>>> df['Exceptions'] = np.where(np.logical_or(df['Delinquency 2+ Month Change'] == 'X', df['Suspect LTV'] == 'LTV < 10%'), 'yes', 'no')
>>> df
Delinquency 2+ Month Change Suspect LTV Exceptions
0 X yes
1 X yes
2 X yes
3 X yes
4 X yes
5 X LTV < 10% yes
Same but alternative with .eq
rather ==
df['Exceptions'] = np.where(np.logical_or(df['Delinquency 2+ Month Change'].eq('X') ,df['Suspect LTV'].eq('LTV < 10%')), 'Yes', 'No')
OR
df['Exceptions'] = np.where(np.logical_or(df['Delinquency 2+ Month Change'].eq('X') ,df['Suspect LTV'].eq('LTV < 10%')), 'Yes', 'No')
In case, you are interested about the way with using defining a function and applying that across ..
Function definition...
def exceptions(row):
if row['Delinquency 2+ Month Change'] == 'X':
return 'Yes'
if row['Suspect LTV'] == 'LTV < 10%':
return 'Yes'
else:
return 'No'
Use the Function with DataFrame.apply()
as follows..
df['Exceptions'] = df.apply(exceptions, axis=1)
print(df)
Delinquency 2+ Month Change Suspect LTV Exceptions
0 X Yes
1 X Yes
2 X Yes
3 X Yes
4 X Yes
5 X LTV < 10% Yes
Upvotes: 0
Reputation: 10860
You shouldn't iterate over the rows of a dataframe for tasks like this (and apply
is just another way of iterating).
You can do simple boolean arithmetic and apply the result to where
:
mcve:
import pandas as pd
df = pd.DataFrame({'A': list('xx x '), 'B': list('y y ')})
df['check'] = 'yes'
df.check.where((df.A=='x') | (df.B=='y'), 'no', inplace=True)
# A B check
# 0 x y yes
# 1 x yes
# 2 y yes
# 3 x yes
# 4 no
Upvotes: 0
Reputation: 10786
The reason that isn't working is because your "exceptions" function needs /both/ columns in order to make a decision, and you're only passing it one of those columns. Instead, you can pass the entire row to your lambda function, and index the columns you need within that function.
def exceptions(row):
if row['Delinquency 2+ Month Change'] == 'X':
return 'Yes'
if row['Suspect LTV'] == 'LTV < 10%':
return 'Yes'
return 'No'
df1['Exceptions'] = df1.apply(exceptions, axis = 1)
I changed your exceptions function to accept one entire row instead of a single cell. There was no need to have a for loop inside it. I also changed how you call the apply function. This new code is essentially the same as df1.apply(lambda x : exceptions(x), axis = 1)
, but you don't need to use lambda
if you're passing your data directly to a function and using the result.
Upvotes: 1