Joop
Joop

Reputation: 8108

Pandas Dataframe object types fillna exception over different datatypes

I have a Pandas Dataframe with different dtypes for the different columns. E.g. df.dtypes returns the following.

Date                    datetime64[ns]
FundID                           int64
FundName                        object
CumPos                           int64
MTMPrice                       float64
PricingMechanism                object

Various of cheese columns have missing values in them. Doing a group operations on it with NaN values in place cause problems. To get rid of them with the .fillna() method is the obvious choice. Problem is the obvious clouse for strings are .fillna("") while .fillna(0) is the correct choice for ints and floats. Using either method on DataFrame throws exception. Any elegant solutions besides doing them individually (have about 30 columns)? I have a lot of code depending on the DataFrame and would prefer not to retype the columns as it is likely to break some other logic. Can do:

df.FundID.fillna(0)
df.FundName.fillna("")
etc

Upvotes: 13

Views: 14254

Answers (6)

pauljohn32
pauljohn32

Reputation: 2255

Rather than running the conversion one column at a time, which is inefficient, here is a way to grab all of the int or float columns and change in one shot.

int_float_cols = df.select_dtypes(include=['int', 'float']).columns
df[int_float_cols] = df[int_float_cols].fillna(value=0)

Obvious how to adapt this to handle object.

I'm aware that in Pandas older versions, there were no NAs allowed in integers, so grabbing the "ints" is not strictly necessary and it may accidentially promote ints to floats. However, in our use case, it is better to be safe than sorry.

I ran into this because ordinary approach, df.fillna(0) corrupted all of the datetime variables.

Upvotes: 2

nik
nik

Reputation: 2294

similar to @Guddi: A bit verbose, but still more concise then @Ryan's answer and keeping all columns:

df[df.select_dtypes("object").columns] = df.select_dtypes("object").fillna("")

Upvotes: 4

alanindublin
alanindublin

Reputation: 101

@Ryan Saxe's answer is accurate. To get it to work on my data I had to set inplace=True and also data= 0 and data= "". See code below:

for col in df:
    #get dtype for column
    dt = df[col].dtype 
    #check if it is a number
    if dt == int or dt == float:
        df[col].fillna(data=0, inplace=True)
    else:
        df[col].fillna(data="", inplace=True)

Upvotes: 1

Guddi
Guddi

Reputation: 63

A compact version example:

#replace Nan with '' for columns of type 'object'
df=df.select_dtypes(include='object').fillna('') 

However, after the above operation, the dataframe will only contain the 'object' type columns. For keeping all columns, use the solution proposed by @Ryan Saxe.

Upvotes: 3

Andy Hayden
Andy Hayden

Reputation: 375535

You can grab the float64 and object columns using:

In [11]: float_cols = df.blocks['float64'].columns

In [12]: object_cols = df.blocks['object'].columns

and int columns won't have NaNs else they would be upcast to float.

Now you can apply the respective fillnas, one cheeky way:

In [13]: d1 = dict((col, '') for col in object_cols)

In [14]: d2 = dict((col, 0) for col in float_cols)

In [15]: df.fillna(value=dict(d1, **d2))

Upvotes: 6

Ryan Saxe
Ryan Saxe

Reputation: 17839

You can iterate through them and use an if statement!

for col in df:
    #get dtype for column
    dt = df[col].dtype 
    #check if it is a number
    if dt == int or dt == float:
        df[col].fillna(0)
    else:
        df[col].fillna("")

When you iterate through a pandas DataFrame, you will get the names of each of the columns, so to access those columns, you use df[col]. This way you don't need to do it manually and the script can just go through each column and check its dtype!

Upvotes: 15

Related Questions