Reputation: 7693
I have a data frame like as shown below
import pandas as pd
import numpy as np
df = pd.DataFrame({'source_value':['Male','Female',np.nan,np.nan,np.nan,'M'],
'new_id':[1,2,3,4,5,6],
'month_of_birth':[11,12,1,3,5,6],
'day_of_birth':[11,21,23,26,10,12],
'year_of_birth':[1967,1987,1956,1999,2005,1987],
'datetime_off':['11/11/1967','21/12/1987','23/01/1956','26/03/1999','10/05/2005','12/06/1987'],
'test_id':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]})
I would like to fill missing values in the column with keywords id
, value
and datetime
.
I tried the below based on startswith
, endswith
and contains
col = df.columns.str
c1 = col.endswith('id')
c2 = col.contains('value')
c3 = col.contains('datetime')
missing_value_filled = np.select([c1,c2,c3],[df.fillna(0),df.fillna(np.nan),df.fillna("01/01/2000 00:00:00")])
pd.DataFrame(missing_value_filled, columns=df.columns)
But the problem is it makes month_of_birth
, day_of_birth
and year_of_birth
as zeroes
even though they don't match my pattern mentioned above. May I know why does this happen?
How can I retain the original values of month
, day
and year
of birth columns?
I get an output like below which is incorrect
My expected output is given below
Upvotes: 1
Views: 297
Reputation: 71689
Let us redefine the fillna
function that takes the arguments as input df
, column masks (col_masks
) along with the corresponding fill values (fill_values
):
def fillna(df, col_masks, fill_values):
df = df.copy()
for m, v in zip(col_masks, fill_values):
df.loc[:, m] = df.loc[:, m].fillna(v)
return df
>>> fillna(df, [c1, c2, c3], [0, np.nan, '01/01/2000 00:00:00'])
source_value new_id month_of_birth day_of_birth year_of_birth datetime_off test_id
0 Male 1 11 11 1967 11/11/1967 0.0
1 Female 2 12 21 1987 21/12/1987 0.0
2 NaN 3 1 23 1956 23/01/1956 0.0
3 NaN 4 3 26 1999 26/03/1999 0.0
4 NaN 5 5 10 2005 10/05/2005 0.0
5 M 6 6 12 1987 12/06/1987 0.0
Upvotes: 1