user7786493
user7786493

Reputation: 473

Convert non-number to NaN and conditionally forward fill in Pandas

My objective is to first fill any values of the below DataFrame that are NOT numbers (e.g. string) to NaN, and then only forward-fill those Nan's where any valid number of that column already appeared before. In other words, there should not be any NaNs contained between numbers of each column. Appreciate any answer.

This is the example of the question:

Date       Open       High   Low      Close
1/1/2015   529.795    peter    peter    peter
1/2/2015   527.561    peter  522.665  523.373
1/5/2015   521.827    peter  511.655  512.463
1/6/2015   513.590    peter  499.678  500.585
1/7/2015   505.612  505.855  498.282  499.728
1/8/2015   496.626  502.101  HELLO    501.303
1/9/2015   503.378  503.537  493.435  494.811
1/12/2015  493.585  494.618  486.225  491.201
1/13/2015  497.474  501.603  491.042  494.821
1/14/2015  493.295  501.852  491.65   499.498
1/15/2015  504.186  504.295  496.397  PETER
1/16/2015  498.641  506.798  498.631  506.689
1/19/2015  498.641  506.798  498.631  506.689
1/20/2015  509.601  511.097   504.63  505.512        

And my desired output is as follows:

date        open     high      low    close
1/1/2015   529.795    NaN    NaN      NaN
1/2/2015   527.561    NaN    522.665  523.373
1/5/2015   521.827    NaN    511.655  512.463
1/6/2015   513.590    NaN    499.678  500.585
1/7/2015   505.612  505.855  498.282  499.728
1/8/2015   496.626  502.101  498.282  501.303
1/9/2015   503.378  503.537  493.435  494.811
1/12/2015  493.585  494.618  486.225  491.201
1/13/2015  497.474  501.603  491.042  494.821
1/14/2015  493.295  501.852   491.65  499.498
1/15/2015  504.186  504.295  496.397  499.498
1/16/2015  498.641  506.798  498.631  506.689
1/19/2015  498.641  506.798  498.631  506.689
1/20/2015  509.601  511.097   504.63  505.512
1/21/2015  505.861  517.858  504.814  516.621
1/22/2015  520.052  534.861  518.277  532.927
1/23/2015  534.123  540.685   531.54  538.471
1/26/2015  537.055  537.524   528.22  533.744
1/27/2015  528.519  529.247  516.771   517.21
1/28/2015  521.348  521.558  508.603  508.603

Upvotes: 0

Views: 168

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

Use set_index,pd.to_numeric with errors='coerce', and ffill():

df.set_index('Date').apply(pd.to_numeric,errors='coerce').ffill().reset_index()

@cmaher's solution is probably faster:

df.iloc[:,1:] = df.iloc[:,1:].apply(pd.to_numeric, errors='coerce').ffill()

Output:

         Date     Open     High      Low    Close
0    1/1/2015  529.795      NaN      NaN      NaN
1    1/2/2015  527.561      NaN  522.665  523.373
2    1/5/2015  521.827      NaN  511.655  512.463
3    1/6/2015  513.590      NaN  499.678  500.585
4    1/7/2015  505.612  505.855  498.282  499.728
5    1/8/2015  496.626  502.101  498.282  501.303
6    1/9/2015  503.378  503.537  493.435  494.811
7   1/12/2015  493.585  494.618  486.225  491.201
8   1/13/2015  497.474  501.603  491.042  494.821
9   1/14/2015  493.295  501.852  491.650  499.498
10  1/15/2015  504.186  504.295  496.397  499.498
11  1/16/2015  498.641  506.798  498.631  506.689
12  1/19/2015  498.641  506.798  498.631  506.689
13  1/20/2015  509.601  511.097  504.630  505.512

Upvotes: 1

Related Questions