Reputation: 696
I have a pandas df
as below :
+------+----+
| x | y |
+------+----+
|ABCD | - |
|DEFG | - |
+------+----+
with data types of x
and y
being object, after replacing the '-' with nan using the below
df = df.replace('-', np.NaN)
It converts the data type of column y
as float while the data type of column y
is expected to remain as object. Also when I try to find out the list of columns having NA values after replacing with NA values it doesn't show any columns having NA values whereas the column y
has NA values. Why is the issue being caused?
EDIT : I'm able to find the columns having NA values as below
df.columns[df.isna().any()].tolist()
Upvotes: 4
Views: 2199
Reputation: 863791
Reason is only NaN
s in columns convert column to floats. Possible solution is use DataFrame.astype
by original dtype
s:
df = df.replace('-',np.NaN).astype(df.dtypes)
print (df.dtypes)
x object
y object
dtype: object
print (df.applymap(type))
x y
0 <class 'str'> <class 'float'>
1 <class 'str'> <class 'float'>
If want test columns with missing values then use:
print (df.columns[df.isna().any()])
Index(['y'], dtype='object')
Another similar idea is extract only NaNs columns and convert them to objects:
df = df.replace('-',np.NaN)
d = dict.fromkeys(df.columns[df.isna().all()], 'object')
print (d)
{'y': 'object'}
df = df.astype(d)
print (df.dtypes)
x object
y object
dtype: object
Upvotes: 1