Reputation: 3427
I am trying to add a new column to a dataframe based on an if statement depending on the values of two columns. i.e. if column x == None then column y else column x
below is the script I have written but doesn't work. any ideas?
dfCurrentReportResults['Retention'] = dfCurrentReportResults.apply(lambda x : x.Retention_y if x.Retention_x == None else x.Retention_x)
Also I got this error message: AttributeError: ("'Series' object has no attribute 'Retention_x'", u'occurred at index BUSINESSUNIT_NAME')
fyi: BUSINESSUNIT_NAME is the first column name
Additional Info:
My data printed out looks like this and I want to add a 3rd column to take a value if there is one else keep NaN.
Retention_x Retention_y
0 1 NaN
1 NaN 0.672183
2 NaN 1.035613
3 NaN 0.771469
4 NaN 0.916667
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
9 NaN NaN
UPDATE: In the end I was having issues referencing the Null or is Null in my dataframe the final line of code I used also including the axis = 1 answered my question.
dfCurrentReportResults['RetentionLambda'] = dfCurrentReportResults.apply(lambda x : x['Retention_y'] if pd.isnull(x['Retention_x']) else x['Retention_x'], axis = 1)
Thanks @EdChum, @strim099 and @aus_lacy for all your input. As my data set gets larger I may switch to the np.where option if I notice performance issues.
Upvotes: 0
Views: 8422
Reputation: 393933
Just use np.where
:
dfCurrentReportResults['Retention'] = np.where(df.Retention_x == None, df.Retention_y, else df.Retention_x)
This uses the test condition, the first param and sets the value to df.Retention_y
else df.Retention_x
also avoid using apply
where possible as this is just going to loop over the values, np.where
is a vectorised method and will scale much better.
UPDATE
OK no need to use np.where
just use the following simpler syntax:
dfCurrentReportResults['Retention'] = df.Retention_y.where(df.Retention_x == None, df.Retention_x)
Further update
dfCurrentReportResults['Retention'] = df.Retention_y.where(df.Retention_x.isnull(), df.Retention_x)
Upvotes: 2
Reputation: 15466
You'r lambda is operating on the 0 axis which is columnwise. Simply add axis=1
to the apply
arg list. This is clearly documented.
In [1]: import pandas
In [2]: dfCurrentReportResults = pandas.DataFrame([['a','b'],['c','d'],['e','f'],['g','h'],['i','j']], columns=['Retention_y', 'Retention_x'])
In [3]: dfCurrentReportResults['Retention_x'][1] = None
In [4]: dfCurrentReportResults['Retention_x'][3] = None
In [5]: dfCurrentReportResults
Out[5]:
Retention_y Retention_x
0 a b
1 c None
2 e f
3 g None
4 i j
In [6]: dfCurrentReportResults['Retention'] = dfCurrentReportResults.apply(lambda x : x.Retention_y if x.Retention_x == None else x.Retention_x, axis=1)
In [7]: dfCurrentReportResults
Out[7]:
Retention_y Retention_x Retention
0 a b b
1 c None c
2 e f f
3 g None g
4 i j j
Upvotes: 4