Reputation: 424
I have a dataset where are middle values for certain periods and I would like to copy this value on nth rows before and nth rows after. In my case the period has 15 days, so 7 days after and 7 before from middle values. How can I do it?
Well, I've checked lots of books and webpages and there are lots of references for fillna however, none of these can solve my issue. So, I haven't tried any code yet.
There is my dataset
DATE RAIN RR_MIDDLE CONDITION_RR CONDITION_PR SEASON
0 1983-07-22 0.000 0.00 Dry Dry Dry_Season
1 1983-07-23 NaN NaN NaN NaN NaN
2 1983-07-24 NaN NaN NaN NaN NaN
.....................................................................
15 1983-08-06 0.000 0.00 Wet Wet Wet_Season
I expected filled tables of same values like middle one in one season (period).
DATE RAIN RR_MIDDLE CONDITION_RR CONDITION_PR SEASON
0 1983-07-22 0.000 0.00 Dry Dry Dry_Season
1 1983-07-23 0.000 0.00 Dry Dry Dry_Season
2 1983-07-24 0.000 0.00 Dry Dry Dry_Season
3 1983-07-25 0.000 0.00 Dry Dry Dry_Season
4 1983-07-26 0.000 0.00 Dry Dry Dry_Season
5 1983-07-27 0.000 0.00 Dry Dry Dry_Season
6 1983-07-28 0.000 0.00 Dry Dry Dry_Season
7 1983-07-29 0.000 0.00 Dry Dry Dry_Season
8 1983-07-30 0.000 0.00 Wet Wet Wet_Season
9 1983-07-31 0.000 0.00 Wet Wet Wet_Season
10 1983-08-01 0.000 0.00 Wet Wet Wet_Season
11 1983-08-02 0.000 0.00 Wet Wet Wet_Season
12 1983-08-03 0.000 0.00 Wet Wet Wet_Season
13 1983-08-04 0.000 0.00 Wet Wet Wet_Season
14 1983-08-05 0.000 0.00 Wet Wet Wet_Season
15 1983-08-06 0.000 0.00 Wet Wet Wet_Season
16 1983-08-07 0.000 0.00 Wet Wet Wet_Season
And so on.....
Upvotes: 1
Views: 80
Reputation: 59549
If you know the number of NaN
s to fill beforehand and it's the same throughout your dataset, the simplest solution is the limit
argument of both fillings:
df.ffill(limit=7).bfill(limit=7)
DATE RAIN RR_MIDDLE CONDITION_RR CONDITION_PR SEASON
0 1983-07-22 0.0 0.0 Dry Dry Dry_Season
1 1983-07-23 0.0 0.0 Dry Dry Dry_Season
2 1983-07-24 0.0 0.0 Dry Dry Dry_Season
3 1983-07-25 0.0 0.0 Dry Dry Dry_Season
4 1983-07-26 0.0 0.0 Dry Dry Dry_Season
5 1983-07-27 0.0 0.0 Dry Dry Dry_Season
6 1983-07-28 0.0 0.0 Dry Dry Dry_Season
7 1983-07-29 0.0 0.0 Dry Dry Dry_Season
8 1983-07-30 0.0 0.0 Wet Wet Wet_Season
9 1983-07-31 0.0 0.0 Wet Wet Wet_Season
10 1983-08-01 0.0 0.0 Wet Wet Wet_Season
11 1983-08-02 0.0 0.0 Wet Wet Wet_Season
12 1983-08-03 0.0 0.0 Wet Wet Wet_Season
13 1983-08-04 0.0 0.0 Wet Wet Wet_Season
14 1983-08-05 0.0 0.0 Wet Wet Wet_Season
15 1983-08-06 0.0 0.0 Wet Wet Wet_Season
Otherwise, you need to interpolate
with nearest
; however that only works for numeric types. So we need to transform each column, interpolate, and transform back.
str_cols = ['CONDITION_RR', 'CONDITION_PR', 'SEASON']
d = {} # Holds mapping from str values to integers
for col in str_cols:
u = df[col].dropna().unique()
d[col] = dict(zip(u, range(len(u))))
df[col] = df[col].map(d[col]) # Map unique values to integers
df = df.apply(pd.Series.interpolate, method='nearest')
# Map back
for col in str_cols:
rev_d = {v:k for k,v in d[col].items()}
df[col] = df[col].map(rev_d)
Upvotes: 1