Reputation: 420
I know the question has been asked several times before, but I am encountering a strange behaviour and hence the question.
Input df
A B C
USA 21-07-2018
USA 22-07-2018
USA 23-07-2018 1
USA 24-07-2018 1
USA 25-07-2018 1
USA 26-07-2018 1
USA 27-07-2018 1
USA 28-07-2018
USA 29-07-2018
USA 30-07-2018 1
USA 31-07-2018 1
USA 01-08-2018 1
USA 02-08-2018 1
USA 03-08-2018 1
USA 04-08-2018
USA 05-08-2018
USA 06-08-2018 1
USA 07-08-2018 1
USA 08-08-2018 1
USA 09-08-2018 1
USA 10-08-2018 1
USA 11-08-2018
USA 12-08-2018
USA 13-08-2018 1
USA 14-08-2018 1
USA 15-08-2018 1
USA 16-08-2018 1
USA 17-08-2018 1
USA 18-08-2018
USA 19-08-2018
I tried out the below two methods
1st Method
df['C'] = df['C'].fillna(method='ffill')
2nd Method
df['C'] = df['C'].ffill()
Both of them resulted in the same dataframe(Output_df)
A B C
USA 21-07-2017 1
USA 22-07-2017 3010.77
USA 23-07-2017 3010.77
USA 24-07-2017 1
USA 25-07-2017 1
USA 26-07-2017 1
USA 27-07-2017 1
USA 28-07-2017 1
USA 29-07-2017 2995.23
USA 30-07-2017 2995.23
USA 31-07-2017 1
USA 01-08-2017 1
USA 02-08-2017 1
USA 03-08-2017 1
USA 04-08-2017 1
USA 05-08-2017 2974.39
USA 06-08-2017 2974.39
USA 07-08-2017 1
USA 08-08-2017 1
USA 09-08-2017 1
USA 10-08-2017 1
USA 11-08-2017 1
Why am I getting value like 3010.77, 2974.39 etc. Is this being averaged out somewhere (input df is quite large >25k rows) ?
What I expected it to be(Expected_df)
A B C
USA 21-07-2018 1
USA 22-07-2018 1
USA 23-07-2018 1
USA 24-07-2018 1
USA 25-07-2018 1
USA 26-07-2018 1
USA 27-07-2018 1
USA 28-07-2018 1
USA 29-07-2018 1
USA 30-07-2018 1
USA 31-07-2018 1
USA 01-08-2018 1
USA 02-08-2018 1
USA 03-08-2018 1
USA 04-08-2018 1
USA 05-08-2018 1
USA 06-08-2018 1
USA 07-08-2018 1
USA 08-08-2018 1
USA 09-08-2018 1
USA 10-08-2018 1
USA 11-08-2018 1
USA 12-08-2018 1
USA 13-08-2018 1
USA 14-08-2018 1
USA 15-08-2018 1
USA 16-08-2018 1
USA 17-08-2018 1
USA 18-08-2018 1
USA 19-08-2018 1
Just to give another example of my expected output
Input df
A B C
AUS 21-07-2017 1.262584
AUS 22-07-2017
AUS 23-07-2017
AUS 24-07-2017 1.258671
AUS 25-07-2017 1.256456
AUS 26-07-2017 1.263913
AUS 27-07-2017 1.249957
AUS 28-07-2017 1.256032
AUS 29-07-2017
AUS 30-07-2017
AUS 31-07-2017 1.254626
AUS 01-08-2017 1.254064
AUS 02-08-2017 1.255136
AUS 03-08-2017 1.259949
AUS 04-08-2017 1.254466
AUS 05-08-2017
AUS 06-08-2017
AUS 07-08-2017 1.263796
AUS 08-08-2017 1.259692
AUS 09-08-2017 1.268349
AUS 10-08-2017 1.269008
AUS 11-08-2017 1.271738
(Expected)Output df
A B C
AUS 21-07-2017 1.262584
AUS 22-07-2017 1.262584
AUS 23-07-2017 1.262584
AUS 24-07-2017 1.258671
AUS 25-07-2017 1.256456
AUS 26-07-2017 1.263913
AUS 27-07-2017 1.249957
AUS 28-07-2017 1.256032
AUS 29-07-2017 1.256032
AUS 30-07-2017 1.256032
AUS 31-07-2017 1.254626
AUS 01-08-2017 1.254064
AUS 02-08-2017 1.255136
AUS 03-08-2017 1.259949
AUS 04-08-2017 1.254466
AUS 05-08-2017 1.254466
AUS 06-08-2017 1.254466
AUS 07-08-2017 1.263796
AUS 08-08-2017 1.259692
AUS 09-08-2017 1.268349
AUS 10-08-2017 1.269008
AUS 11-08-2017 1.271738
Upvotes: 0
Views: 2295
Reputation: 34086
I think you have whitespaces
in your column. You need to replace those with numpy.nan
.
If you are unsure about how many blanks are there, you can do:
import numpy as np
df['C'].replace(r'^\s*$', np.nan, regex=True, inplace=True)
Then use ffill()
for expected behaviour.
df['C'] = df['C'].ffill()
Upvotes: 2