user308827
user308827

Reputation: 21971

Replace value in any column in pandas dataframe

In the dataframe below:

 T2MN   T2MX   RH2M DFP2M RAIN 
 6.96   9.32  84.27  5.57    - 
 6.31  10.46  -      5.63    - 
  -    10.66  79.38  3.63    - 
 0.79   4.45  94.24  1.85    - 
 1.45   3.99  91.71  1.17    - 

How do I replace all the - with NaN's. I do not want to specify column names since I do not know before hand which column will have -

Upvotes: 4

Views: 1029

Answers (4)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

It looks like you were reading this data from CSV/FWF file... If it's true the easiest way to get rid of '-' would be to explain Pandas that it's NaN's representation:

df = pd.read_csv(filename, na_values=['NaN', 'nan', '-'])

Test:

In [79]: df
Out[79]:
   T2MN   T2MX   RH2M  DFP2M  RAIN
0  6.96   9.32  84.27   5.57   NaN
1  6.31  10.46    NaN   5.63   NaN
2   NaN  10.66  79.38   3.63   NaN
3  0.79   4.45  94.24   1.85   NaN
4  1.45   3.99  91.71   1.17   NaN

In [80]: df.dtypes
Out[80]:
T2MN     float64
T2MX     float64
RH2M     float64
DFP2M    float64
RAIN     float64
dtype: object

Upvotes: 1

piRSquared
piRSquared

Reputation: 294278

If those are strings, then your floats are probably also strings.

Assuming your dataframe is df, I'd try

pd.to_numeric(df.stack(), 'coerce').unstack()

Deeper explanation

Pandas doesn't usually represent missing floats with '-'. Therefore, that '-' must be a string. Thus, the dtype of any column with a '-' in it, must be 'object'. That makes it highly likely that whatever parsed the data, left the floats as string.

setup

from io import StringIO
import pandas as pd

txt = """T2MN   T2MX   RH2M DFP2M RAIN 
 6.96   9.32  84.27  5.57    - 
 6.31  10.46  -      5.63    - 
  -    10.66  79.38  3.63    - 
 0.79   4.45  94.24  1.85    - 
 1.45   3.99  91.71  1.17    - """

df = pd.read_csv(StringIO(txt), delim_whitespace=True)
print(df)

   T2MN   T2MX   RH2M  DFP2M RAIN
0  6.96   9.32  84.27   5.57    -
1  6.31  10.46      -   5.63    -
2     -  10.66  79.38   3.63    -
3  0.79   4.45  94.24   1.85    -
4  1.45   3.99  91.71   1.17    -

What are the dtypes?

print(df.dtypes)

T2MN      object
T2MX     float64
RH2M      object
DFP2M    float64
RAIN      object
dtype: object

What is the type of the first element?

print(type(df.iloc[0, 0]))

<class 'str'>

This means that any column with a '-' is like a column of strings that look like floats. You want to use pd.to_numeric with parameter errors='coerce' to force non-numeric items to np.nan. However, pd.to_numeric does not operate on a pd.DataFrame so we stack and unstack.

pd.to_numeric(df.stack(), 'coerce').unstack()

   T2MN   T2MX   RH2M  DFP2M  RAIN
0  6.96   9.32  84.27   5.57   NaN
1  6.31  10.46    NaN   5.63   NaN
2   NaN  10.66  79.38   3.63   NaN
3  0.79   4.45  94.24   1.85   NaN
4  1.45   3.99  91.71   1.17   NaN

Upvotes: 5

Alex Fung
Alex Fung

Reputation: 2006

I think you want the actual numpy.nan instead of a string NaN as you can use a lot of methods such as fillna/isnull/notnull on the pandas.Series/pandas.DataFrame

import pandas as pd

import numpy as np

df = pd.DataFrame([['-']*10]*10)

df = df.replace('-',np.nan)

Upvotes: 2

chrisaycock
chrisaycock

Reputation: 37930

Just replace() the string:

In [10]: df.replace('-', 'NaN')
Out[10]: 
   T2MN   T2MX   RH2M  DFP2M RAIN
0  6.96   9.32  84.27   5.57  NaN
1  6.31  10.46    NaN   5.63  NaN
2   NaN  10.66  79.38   3.63  NaN
3  0.79   4.45  94.24   1.85  NaN
4  1.45   3.99  91.71   1.17  NaN

Upvotes: 5

Related Questions