user3046211
user3046211

Reputation: 696

replacing '-' with np.nan converts the data type to float in pandas

I have a pandas df as below :

 +------+----+
 |  x   |  y |
 +------+----+
 |ABCD  | -  |           
 |DEFG  | -  |
 +------+----+

with data types of x and y being object, after replacing the '-' with nan using the below

df = df.replace('-', np.NaN)

It converts the data type of column y as float while the data type of column y is expected to remain as object. Also when I try to find out the list of columns having NA values after replacing with NA values it doesn't show any columns having NA values whereas the column y has NA values. Why is the issue being caused?

EDIT : I'm able to find the columns having NA values as below

df.columns[df.isna().any()].tolist()

Upvotes: 4

Views: 2199

Answers (1)

jezrael
jezrael

Reputation: 863791

Reason is only NaNs in columns convert column to floats. Possible solution is use DataFrame.astype by original dtypes:

df = df.replace('-',np.NaN).astype(df.dtypes)

print (df.dtypes)
x    object
y    object
dtype: object

print (df.applymap(type))
               x                y
0  <class 'str'>  <class 'float'>
1  <class 'str'>  <class 'float'>

If want test columns with missing values then use:

print (df.columns[df.isna().any()])
Index(['y'], dtype='object')

Another similar idea is extract only NaNs columns and convert them to objects:

df = df.replace('-',np.NaN)


d = dict.fromkeys(df.columns[df.isna().all()], 'object')
print (d)
{'y': 'object'}

df = df.astype(d)

print (df.dtypes)
x    object
y    object
dtype: object

Upvotes: 1

Related Questions