user308827
user308827

Reputation: 21961

Convert string based NaN's to numpy NaN's

I have a dataframe with a part of it shown as below:

2016-12-27              NaN
2016-12-28              NaN
2016-12-29              NaN
2016-12-30              NaN
2016-12-31              NaN
Name: var_name, dtype: object

The column contains NaN as strings/objects. How can I convert it to a numpy nan instead. Best would be able to do so when I read in the csv file.

Upvotes: 7

Views: 11721

Answers (4)

GH KIM
GH KIM

Reputation: 136

df[var_name_replace] = df[var_name].replace('NaN', pd.NA)

This simply replaces the 'NaN' string object with pd.NA. This uses the top-level .replace(), not the string replace (that is, NOT .str.replace()).

You could use np.nan in the place of pd.NA and it would work nearly the same way.

When you first load a file, Pandas will usually do these sorts of conversions by default. But if you know just a specific form of the string is in your object at some later point in the program, this is a good way to accomplish the conversion.

Upvotes: 5

piRSquared
piRSquared

Reputation: 294218

I'd use the converters option in read_csv. In this case, we are aiming to convert the column in question to numeric values and treat everything else as numpy.nan which includes string version of 'NaN'

converter = lambda x: pd.to_numeric(x, 'coerce')
df = pd.read_csv(StringIO(txt), delim_whitespace=True, converters={1: converter}, header=None)
df

enter image description here

df.dtypes

0     object
1    float64
dtype: object

Upvotes: 2

rojeeer
rojeeer

Reputation: 2011

Yes, you can do this when reading the csv file.

df = pd.read_csv('test.csv', names=['t', 'v'], dtype={'v':np.float64})

Check the docs of pandas.read_csv. There are some parameters is useful for your application:

  • names
  • dtype
  • na_values

Hope this would be helpful.

Upvotes: 1

dawg
dawg

Reputation: 103744

Suppose we have:

>>> df=pd.DataFrame({'col':['NaN']*10})

You can use .apply to convert:

>>> new_df=df.apply(float, axis=1)
>>> type(new_df[0])
<type 'numpy.float64'>

Upvotes: 1

Related Questions