Reputation: 8108
I have a Pandas Dataframe with different dtypes for the different columns. E.g. df.dtypes returns the following.
Date datetime64[ns]
FundID int64
FundName object
CumPos int64
MTMPrice float64
PricingMechanism object
Various of cheese columns have missing values in them. Doing a group operations on it with NaN values in place cause problems. To get rid of them with the .fillna() method is the obvious choice. Problem is the obvious clouse for strings are .fillna("") while .fillna(0) is the correct choice for ints and floats. Using either method on DataFrame throws exception. Any elegant solutions besides doing them individually (have about 30 columns)? I have a lot of code depending on the DataFrame and would prefer not to retype the columns as it is likely to break some other logic. Can do:
df.FundID.fillna(0)
df.FundName.fillna("")
etc
Upvotes: 13
Views: 14254
Reputation: 2255
Rather than running the conversion one column at a time, which is inefficient, here is a way to grab all of the int or float columns and change in one shot.
int_float_cols = df.select_dtypes(include=['int', 'float']).columns
df[int_float_cols] = df[int_float_cols].fillna(value=0)
Obvious how to adapt this to handle object.
I'm aware that in Pandas older versions, there were no NAs allowed in integers, so grabbing the "ints" is not strictly necessary and it may accidentially promote ints to floats. However, in our use case, it is better to be safe than sorry.
I ran into this because ordinary approach, df.fillna(0)
corrupted all of the datetime variables.
Upvotes: 2
Reputation: 2294
similar to @Guddi: A bit verbose, but still more concise then @Ryan's answer and keeping all columns:
df[df.select_dtypes("object").columns] = df.select_dtypes("object").fillna("")
Upvotes: 4
Reputation: 101
@Ryan Saxe's answer is accurate. To get it to work on my data I had to set inplace=True
and also data= 0
and data= ""
. See code below:
for col in df:
#get dtype for column
dt = df[col].dtype
#check if it is a number
if dt == int or dt == float:
df[col].fillna(data=0, inplace=True)
else:
df[col].fillna(data="", inplace=True)
Upvotes: 1
Reputation: 63
A compact version example:
#replace Nan with '' for columns of type 'object'
df=df.select_dtypes(include='object').fillna('')
However, after the above operation, the dataframe will only contain the 'object' type columns. For keeping all columns, use the solution proposed by @Ryan Saxe.
Upvotes: 3
Reputation: 375535
You can grab the float64 and object columns using:
In [11]: float_cols = df.blocks['float64'].columns
In [12]: object_cols = df.blocks['object'].columns
and int columns won't have NaNs else they would be upcast to float.
Now you can apply the respective fillna
s, one cheeky way:
In [13]: d1 = dict((col, '') for col in object_cols)
In [14]: d2 = dict((col, 0) for col in float_cols)
In [15]: df.fillna(value=dict(d1, **d2))
Upvotes: 6
Reputation: 17839
You can iterate through them and use an if
statement!
for col in df:
#get dtype for column
dt = df[col].dtype
#check if it is a number
if dt == int or dt == float:
df[col].fillna(0)
else:
df[col].fillna("")
When you iterate through a pandas DataFrame, you will get the names of each of the columns, so to access those columns, you use df[col]
. This way you don't need to do it manually and the script can just go through each column and check its dtype!
Upvotes: 15