Reputation: 5107
I have a data-frame (df
) which the head looks like:
DATE BBG FLAG DATE_DIFF NEW_DATE
0 2014-03-04 1333 1 -31.0 2014-04-03
1 2014-04-04 1333 0 -91.0 2014-04-04
2 2014-07-04 1333 1 -31.0 2014-04-07
3 2014-08-04 1333 1 -31.0 2014-04-08
4 2014-09-04 1333 1 -30.0 2014-04-09
5 2014-10-04 1333 1 -31.0 2014-04-10
6 2014-11-04 1333 1 204.0 2014-04-11
7 2014-04-14 1333 0 -1.0 2014-04-14
8 2014-04-15 1333 0 -1.0 2014-04-15
9 2014-04-16 1333 0 -1.0 2014-04-16
I am trying to create a new column called PICKED
, which selects a date from either the DATE
or NEW_DATE
column based on the following:
If FLAG = 1
then if NEW_DATE
is a valid date then pick NEW_DATE
otherwise pick DATE.
The expected output will look like:
DATE BBG FLAG DATE_DIFF NEW_DATE PICKED
0 2014-03-04 1333 1 -31.0 2014-04-03 2014-04-03
1 2014-04-04 1333 0 -91.0 2014-04-04 2014-04-04
2 2014-07-04 1333 1 -31.0 2014-04-07 2014-04-07
3 2014-08-04 1333 1 -31.0 2014-04-08 2014-04-08
4 2014-09-04 1333 1 -30.0 2014-04-09 2014-04-09
5 2014-10-04 1333 1 -31.0 2014-04-10 2014-04-10
6 2014-11-04 1333 1 204.0 2014-04-11 2014-04-11
7 2014-04-14 1333 0 -1.0 2014-04-14 2014-04-14
8 2014-04-15 1333 0 -1.0 2014-04-15 2014-04-15
9 2014-04-16 1333 0 -1.0 2014-04-16 2014-04-16
I am trying to use the following to achive this:
df['PICKED'] = np.where(df['FLAG'] == 1, df.NEW_DATE.fillna(df.DATE, inplace=True), df['DATE'])
But I get the following returned:
DATE BBG FLAG DATE_DIFF NEW_DATE PICKED
0 2014-03-04 1333 1 -31.0 2014-04-03 NaT
1 2014-04-04 1333 0 -91.0 2014-04-04 2014-04-04
2 2014-07-04 1333 1 -31.0 2014-04-07 NaT
3 2014-08-04 1333 1 -31.0 2014-04-08 NaT
4 2014-09-04 1333 1 -30.0 2014-04-09 NaT
5 2014-10-04 1333 1 -31.0 2014-04-10 NaT
6 2014-11-04 1333 1 204.0 2014-04-11 NaT
7 2014-04-14 1333 0 -1.0 2014-04-14 2014-04-14
8 2014-04-15 1333 0 -1.0 2014-04-15 2014-04-15
9 2014-04-16 1333 0 -1.0 2014-04-16 2014-04-16
Where have I gone wrong?
Upvotes: 1
Views: 47
Reputation: 9081
You were almost there. Use -
df['PICKED'] = np.where(df['FLAG'] == 1, df.NEW_DATE.fillna(df.DATE), df['DATE'])
Output
DATE BBG FLAG DATE_DIFF NEW_DATE PICKED
0 2014-03-04 1333 1 -31.0 2014-04-03 2014-04-03
1 2014-04-04 1333 0 -91.0 2014-04-04 2014-04-04
2 2014-07-04 1333 1 -31.0 2014-04-07 2014-04-07
3 2014-08-04 1333 1 -31.0 2014-04-08 2014-04-08
4 2014-09-04 1333 1 -30.0 2014-04-09 2014-04-09
5 2014-10-04 1333 1 -31.0 2014-04-10 2014-04-10
6 2014-11-04 1333 1 204.0 2014-04-11 2014-04-11
7 2014-04-14 1333 0 -1.0 2014-04-14 2014-04-14
8 2014-04-15 1333 0 -1.0 2014-04-15 2014-04-15
9 2014-04-16 1333 0 -1.0 2014-04-16 2014-04-16
Explanation
Using the inplace
parameter modifies the df
directly. It is great if you use the fillna()
as you were using in the line before and then use df.NEW_DATE
directly in np.where()
But, the way you are using the fillna()
needs something to be returned to the np.where
. Specifying inplace=True
returns None
but modifies the df
directly.
In this case as is evident you needed to return something for the np.where()
parameter and hence removing the inplace=True
(which defaults to inplace=False
) gets you what you need.
Upvotes: 1