Reputation: 11657
I'm trying to fillna
per column with a suitable variable. My goal is to try find the column type at the highest level of generality: basically, at the moment it is either numeric (int/float), string, or pandas Timestamp
. I understand that I can detect numeric or string using numpy.issubdtype
and the type hierarchy, but I haven't found a way to detect Timestamp
. My solution uses iloc[0]
and isinstance
, but is there something better? Here is my code, roughly:
for col in df:
if np.issubdtype(dataframe[col].dtype, np.number):
df[col] = df[col].fillna(-1)
elif isinstance(dataframe[col].iloc[0], pd.datetime):
df[col] = df[col].fillna(pd.to_datetime('1900-01-01'))
else:
df[col] = df[col].fillna('NaN')
return (dataframe.fillna(na_var)
(Note that I can't use df.loc[0, col]
because my index doesn't always contain 0.)
Upvotes: 4
Views: 7123
Reputation: 1216
Form me, np.issubdtype(df[col].dtype, np.datetime64)
does what you want.
So taking everything together, we have:
def df_fillna(df):
for col in df:
if np.issubdtype(df[col].dtype, np.number):
df[col] = df[col].fillna(-1)
elif np.issubdtype(df[col].dtype, np.datetime64):
df[col] = df[col].fillna(pd.to_datetime('1900-01-01'))
else:
df[col] = df[col].fillna('NaN')
return df
An example. Input:
df_test = pd.DataFrame()
df_test['dates'] = [pd.to_datetime("2009-7-23"), pd.to_datetime("2011-7-7"), pd.NaT]
df_test = df_fillna(df_test)
Output:
dates
0 2009-07-23
1 2011-07-07
2 1900-01-01
Upvotes: 6