Reputation: 99
I have a simple function that binary encodes t
and f
strings found within my dataframe. note: df has missing values
def binary_encoding(df):
return df.replace({"t":1, "f":0})
The output is in float
form, but i desire int
values.
I tried:
def binary_encoding(df):
encode = df.replace({"t":1, "f":0})
return int(encode)
but i get an error int() argument must be a string, a bytes-like object or a number, not 'DataFrame'
.
Upvotes: 1
Views: 1384
Reputation: 23217
Suppose you have 4 columns as follows:
3 columns have NaN
values and one column without.
df = pd.DataFrame({'Col1': ['f', 't', np.nan], 'Col2': [np.nan, 'f', 't'], 'Col3': ['f', np.nan, 't'], 'Col4': ['f', 't', 'f']})
Col1 Col2 Col3 Col4
0 f NaN f f
1 t f NaN t
2 NaN t t f
Now, after your binary encoding by your function:
def binary_encoding(df):
return df.replace({"t":1, "f":0})
new_df = binary_encoding(df)
print(new_df)
Col1 Col2 Col3 Col4
0 0.0 NaN 0.0 0
1 1.0 0.0 NaN 1
2 NaN 1.0 1.0 0
Data types of new_df:
new_df.dtypes
Col1 float64
Col2 float64
Col3 float64
Col4 int64
dtype: object
new_df_int = new_df.astype('Int64')
print(new_df_int)
Col1 Col2 Col3 Col4
0 0 <NA> 0 0
1 1 0 <NA> 1
2 <NA> 1 1 0
Data types of new_df_int:
new_df_int.dtypes
Col1 Int64
Col2 Int64
Col3 Int64
Col4 Int64
dtype: object
You now have the integer data type and display as integers as you want!
You can also apply the data type conversion to individual columns instead of the whole dateframe, e.g.:
new_df['Col1'] = new_df['Col1'].astype('Int64')
Upvotes: 2
Reputation: 559
If you want to return the dataframe with int values, try:
def binary_encoding(df):
encode = df.replace({"t":1, "f":0})
return encode.astype("int")
UPDATE:
If you have NaN values in your dataframe, decide how you want to handle them. You can either df.dropna()
or df.fillna()
. You can also handle infs similarly by first converting them to NaN df.replace([np.inf, -np.inf], np.nan)
Something like:
import numpy as np
def handle_na_and_inf(df):
df = df.replace([np.inf, -np.inf], np.nan
df = df.dropna() # use fillna() if you want to fill with another value
return df
Upvotes: 2