DropKick
DropKick

Reputation: 99

Pandas: Change output values from float to int from columns with NaN values

I have a simple function that binary encodes t and f strings found within my dataframe. note: df has missing values

def binary_encoding(df):
    return df.replace({"t":1, "f":0})

The output is in float form, but i desire int values.

I tried:

 def binary_encoding(df):
    encode = df.replace({"t":1, "f":0})
    return int(encode)

but i get an error int() argument must be a string, a bytes-like object or a number, not 'DataFrame'.

Upvotes: 1

Views: 1384

Answers (2)

SeaBean
SeaBean

Reputation: 23217

You can use the integer with N/A support data type:

Suppose you have 4 columns as follows:

3 columns have NaN values and one column without.

df = pd.DataFrame({'Col1': ['f', 't', np.nan], 'Col2': [np.nan, 'f', 't'], 'Col3': ['f', np.nan, 't'], 'Col4': ['f', 't', 'f']})


  Col1 Col2 Col3 Col4
0    f  NaN    f    f
1    t    f  NaN    t
2  NaN    t    t    f

Now, after your binary encoding by your function:

def binary_encoding(df):
    return df.replace({"t":1, "f":0})

new_df = binary_encoding(df)

print(new_df)


   Col1  Col2  Col3  Col4
0   0.0   NaN   0.0     0
1   1.0   0.0   NaN     1
2   NaN   1.0   1.0     0

Data types of new_df:

new_df.dtypes

Col1    float64
Col2    float64
Col3    float64
Col4      int64
dtype: object

Data type conversion using the integer with N/A support data type:

new_df_int = new_df.astype('Int64')


print(new_df_int)


   Col1  Col2  Col3  Col4
0     0  <NA>     0     0
1     1     0  <NA>     1
2  <NA>     1     1     0 

Data types of new_df_int:

new_df_int.dtypes

Col1    Int64
Col2    Int64
Col3    Int64
Col4    Int64
dtype: object

You now have the integer data type and display as integers as you want!

You can also apply the data type conversion to individual columns instead of the whole dateframe, e.g.:

new_df['Col1'] = new_df['Col1'].astype('Int64')

Upvotes: 2

myz540
myz540

Reputation: 559

If you want to return the dataframe with int values, try:

def binary_encoding(df):
    encode = df.replace({"t":1, "f":0})
    return encode.astype("int")

UPDATE:

If you have NaN values in your dataframe, decide how you want to handle them. You can either df.dropna() or df.fillna(). You can also handle infs similarly by first converting them to NaN df.replace([np.inf, -np.inf], np.nan)

Something like:

import numpy as np

def handle_na_and_inf(df):
    df = df.replace([np.inf, -np.inf], np.nan
    df = df.dropna()  # use fillna() if you want to fill with another value
    return df

Upvotes: 2

Related Questions