Reputation: 473
My objective is to first fill any values of the below DataFrame that are NOT numbers (e.g. string) to NaN, and then only forward-fill those Nan's where any valid number of that column already appeared before. In other words, there should not be any NaNs contained between numbers of each column. Appreciate any answer.
This is the example of the question:
Date Open High Low Close
1/1/2015 529.795 peter peter peter
1/2/2015 527.561 peter 522.665 523.373
1/5/2015 521.827 peter 511.655 512.463
1/6/2015 513.590 peter 499.678 500.585
1/7/2015 505.612 505.855 498.282 499.728
1/8/2015 496.626 502.101 HELLO 501.303
1/9/2015 503.378 503.537 493.435 494.811
1/12/2015 493.585 494.618 486.225 491.201
1/13/2015 497.474 501.603 491.042 494.821
1/14/2015 493.295 501.852 491.65 499.498
1/15/2015 504.186 504.295 496.397 PETER
1/16/2015 498.641 506.798 498.631 506.689
1/19/2015 498.641 506.798 498.631 506.689
1/20/2015 509.601 511.097 504.63 505.512
And my desired output is as follows:
date open high low close
1/1/2015 529.795 NaN NaN NaN
1/2/2015 527.561 NaN 522.665 523.373
1/5/2015 521.827 NaN 511.655 512.463
1/6/2015 513.590 NaN 499.678 500.585
1/7/2015 505.612 505.855 498.282 499.728
1/8/2015 496.626 502.101 498.282 501.303
1/9/2015 503.378 503.537 493.435 494.811
1/12/2015 493.585 494.618 486.225 491.201
1/13/2015 497.474 501.603 491.042 494.821
1/14/2015 493.295 501.852 491.65 499.498
1/15/2015 504.186 504.295 496.397 499.498
1/16/2015 498.641 506.798 498.631 506.689
1/19/2015 498.641 506.798 498.631 506.689
1/20/2015 509.601 511.097 504.63 505.512
1/21/2015 505.861 517.858 504.814 516.621
1/22/2015 520.052 534.861 518.277 532.927
1/23/2015 534.123 540.685 531.54 538.471
1/26/2015 537.055 537.524 528.22 533.744
1/27/2015 528.519 529.247 516.771 517.21
1/28/2015 521.348 521.558 508.603 508.603
Upvotes: 0
Views: 168
Reputation: 153460
Use set_index
,pd.to_numeric
with errors='coerce'
, and ffill()
:
df.set_index('Date').apply(pd.to_numeric,errors='coerce').ffill().reset_index()
@cmaher's solution is probably faster:
df.iloc[:,1:] = df.iloc[:,1:].apply(pd.to_numeric, errors='coerce').ffill()
Output:
Date Open High Low Close
0 1/1/2015 529.795 NaN NaN NaN
1 1/2/2015 527.561 NaN 522.665 523.373
2 1/5/2015 521.827 NaN 511.655 512.463
3 1/6/2015 513.590 NaN 499.678 500.585
4 1/7/2015 505.612 505.855 498.282 499.728
5 1/8/2015 496.626 502.101 498.282 501.303
6 1/9/2015 503.378 503.537 493.435 494.811
7 1/12/2015 493.585 494.618 486.225 491.201
8 1/13/2015 497.474 501.603 491.042 494.821
9 1/14/2015 493.295 501.852 491.650 499.498
10 1/15/2015 504.186 504.295 496.397 499.498
11 1/16/2015 498.641 506.798 498.631 506.689
12 1/19/2015 498.641 506.798 498.631 506.689
13 1/20/2015 509.601 511.097 504.630 505.512
Upvotes: 1