Stacey
Stacey

Reputation: 5107

Selecting one of two dataframe columns as input into a new column

I have a data-frame (df) which the head looks like:

             DATE   BBG  FLAG  DATE_DIFF   NEW_DATE    
0      2014-03-04  1333     1      -31.0 2014-04-03 
1      2014-04-04  1333     0      -91.0 2014-04-04 
2      2014-07-04  1333     1      -31.0 2014-04-07 
3      2014-08-04  1333     1      -31.0 2014-04-08 
4      2014-09-04  1333     1      -30.0 2014-04-09 
5      2014-10-04  1333     1      -31.0 2014-04-10 
6      2014-11-04  1333     1      204.0 2014-04-11
7      2014-04-14  1333     0       -1.0 2014-04-14 
8      2014-04-15  1333     0       -1.0 2014-04-15 
9      2014-04-16  1333     0       -1.0 2014-04-16 

I am trying to create a new column called PICKED, which selects a date from either the DATE or NEW_DATE column based on the following:

If FLAG = 1 then if NEW_DATE is a valid date then pick NEW_DATE otherwise pick DATE.

The expected output will look like:

             DATE   BBG   FLAG  DATE_DIFF   NEW_DATE     PICKED
0      2014-03-04  1333      1      -31.0 2014-04-03 2014-04-03
1      2014-04-04  1333      0      -91.0 2014-04-04 2014-04-04
2      2014-07-04  1333      1      -31.0 2014-04-07 2014-04-07
3      2014-08-04  1333      1      -31.0 2014-04-08 2014-04-08
4      2014-09-04  1333      1      -30.0 2014-04-09 2014-04-09
5      2014-10-04  1333      1      -31.0 2014-04-10 2014-04-10
6      2014-11-04  1333      1      204.0 2014-04-11 2014-04-11
7      2014-04-14  1333      0       -1.0 2014-04-14 2014-04-14
8      2014-04-15  1333      0       -1.0 2014-04-15 2014-04-15
9      2014-04-16  1333      0       -1.0 2014-04-16 2014-04-16

I am trying to use the following to achive this:

df['PICKED'] = np.where(df['FLAG'] == 1, df.NEW_DATE.fillna(df.DATE, inplace=True), df['DATE'])

But I get the following returned:

             DATE   BBG   FLAG  DATE_DIFF   NEW_DATE     PICKED
0      2014-03-04  1333      1      -31.0 2014-04-03        NaT
1      2014-04-04  1333      0      -91.0 2014-04-04 2014-04-04
2      2014-07-04  1333      1      -31.0 2014-04-07        NaT
3      2014-08-04  1333      1      -31.0 2014-04-08        NaT
4      2014-09-04  1333      1      -30.0 2014-04-09        NaT
5      2014-10-04  1333      1      -31.0 2014-04-10        NaT
6      2014-11-04  1333      1      204.0 2014-04-11        NaT
7      2014-04-14  1333      0       -1.0 2014-04-14 2014-04-14
8      2014-04-15  1333      0       -1.0 2014-04-15 2014-04-15
9      2014-04-16  1333      0       -1.0 2014-04-16 2014-04-16

Where have I gone wrong?

Upvotes: 1

Views: 47

Answers (1)

Vivek Kalyanarangan
Vivek Kalyanarangan

Reputation: 9081

You were almost there. Use -

df['PICKED'] = np.where(df['FLAG'] == 1, df.NEW_DATE.fillna(df.DATE), df['DATE'])

Output

    DATE    BBG FLAG    DATE_DIFF   NEW_DATE    PICKED
0   2014-03-04  1333    1   -31.0   2014-04-03  2014-04-03
1   2014-04-04  1333    0   -91.0   2014-04-04  2014-04-04
2   2014-07-04  1333    1   -31.0   2014-04-07  2014-04-07
3   2014-08-04  1333    1   -31.0   2014-04-08  2014-04-08
4   2014-09-04  1333    1   -30.0   2014-04-09  2014-04-09
5   2014-10-04  1333    1   -31.0   2014-04-10  2014-04-10
6   2014-11-04  1333    1   204.0   2014-04-11  2014-04-11
7   2014-04-14  1333    0   -1.0    2014-04-14  2014-04-14
8   2014-04-15  1333    0   -1.0    2014-04-15  2014-04-15
9   2014-04-16  1333    0   -1.0    2014-04-16  2014-04-16

Explanation

Using the inplace parameter modifies the df directly. It is great if you use the fillna() as you were using in the line before and then use df.NEW_DATE directly in np.where()

But, the way you are using the fillna() needs something to be returned to the np.where. Specifying inplace=True returns None but modifies the df directly.

In this case as is evident you needed to return something for the np.where() parameter and hence removing the inplace=True (which defaults to inplace=False) gets you what you need.

Upvotes: 1

Related Questions