Reputation: 21961
I have a dataframe with a part of it shown as below:
2016-12-27 NaN
2016-12-28 NaN
2016-12-29 NaN
2016-12-30 NaN
2016-12-31 NaN
Name: var_name, dtype: object
The column contains NaN as strings/objects. How can I convert it to a numpy nan instead. Best would be able to do so when I read in the csv file.
Upvotes: 7
Views: 11721
Reputation: 136
df[var_name_replace] = df[var_name].replace('NaN', pd.NA)
This simply replaces the 'NaN' string object with pd.NA
. This uses the top-level .replace()
, not the string replace (that is, NOT .str.replace()
).
You could use np.nan
in the place of pd.NA
and it would work nearly the same way.
When you first load a file, Pandas will usually do these sorts of conversions by default. But if you know just a specific form of the string is in your object at some later point in the program, this is a good way to accomplish the conversion.
Upvotes: 5
Reputation: 294218
I'd use the converters
option in read_csv
. In this case, we are aiming to convert the column in question to numeric values and treat everything else as numpy.nan
which includes string version of 'NaN'
converter = lambda x: pd.to_numeric(x, 'coerce')
df = pd.read_csv(StringIO(txt), delim_whitespace=True, converters={1: converter}, header=None)
df
df.dtypes
0 object
1 float64
dtype: object
Upvotes: 2
Reputation: 2011
Yes, you can do this when reading the csv file.
df = pd.read_csv('test.csv', names=['t', 'v'], dtype={'v':np.float64})
Check the docs of pandas.read_csv. There are some parameters is useful for your application:
Hope this would be helpful.
Upvotes: 1
Reputation: 103744
Suppose we have:
>>> df=pd.DataFrame({'col':['NaN']*10})
You can use .apply
to convert:
>>> new_df=df.apply(float, axis=1)
>>> type(new_df[0])
<type 'numpy.float64'>
Upvotes: 1