Luckasino
Luckasino

Reputation: 424

Is there an easy python approach to fill missing values nth before and nth after

I have a dataset where are middle values for certain periods and I would like to copy this value on nth rows before and nth rows after. In my case the period has 15 days, so 7 days after and 7 before from middle values. How can I do it?

Well, I've checked lots of books and webpages and there are lots of references for fillna however, none of these can solve my issue. So, I haven't tried any code yet.

There is my dataset

    DATE        RAIN    RR_MIDDLE   CONDITION_RR    CONDITION_PR    SEASON
 0  1983-07-22  0.000   0.00        Dry             Dry            Dry_Season
 1  1983-07-23  NaN      NaN         NaN            NaN            NaN
 2  1983-07-24  NaN      NaN         NaN            NaN            NaN
 .....................................................................
15  1983-08-06  0.000   0.00         Wet            Wet            Wet_Season

I expected filled tables of same values like middle one in one season (period).

 DATE           RAIN    RR_MIDDLE   CONDITION_RR    CONDITION_PR    SEASON
0   1983-07-22  0.000   0.00        Dry             Dry            Dry_Season
1   1983-07-23  0.000   0.00        Dry             Dry            Dry_Season
2   1983-07-24  0.000   0.00        Dry             Dry            Dry_Season
3   1983-07-25  0.000   0.00        Dry             Dry            Dry_Season
4   1983-07-26  0.000   0.00        Dry             Dry            Dry_Season
5   1983-07-27  0.000   0.00        Dry             Dry            Dry_Season
6   1983-07-28  0.000   0.00        Dry             Dry            Dry_Season
7   1983-07-29  0.000   0.00        Dry             Dry            Dry_Season
8   1983-07-30  0.000   0.00        Wet             Wet            Wet_Season
9   1983-07-31  0.000   0.00        Wet             Wet            Wet_Season
10  1983-08-01  0.000   0.00        Wet             Wet            Wet_Season
11  1983-08-02  0.000   0.00        Wet             Wet            Wet_Season
12  1983-08-03  0.000   0.00        Wet             Wet            Wet_Season
13  1983-08-04  0.000   0.00        Wet             Wet            Wet_Season
14  1983-08-05  0.000   0.00        Wet             Wet            Wet_Season
15  1983-08-06  0.000   0.00        Wet             Wet            Wet_Season
16  1983-08-07  0.000   0.00        Wet             Wet            Wet_Season
And so on.....

Upvotes: 1

Views: 80

Answers (1)

ALollz
ALollz

Reputation: 59549

If you know the number of NaNs to fill beforehand and it's the same throughout your dataset, the simplest solution is the limit argument of both fillings:

df.ffill(limit=7).bfill(limit=7)

          DATE  RAIN  RR_MIDDLE CONDITION_RR CONDITION_PR      SEASON
0   1983-07-22   0.0        0.0          Dry          Dry  Dry_Season
1   1983-07-23   0.0        0.0          Dry          Dry  Dry_Season
2   1983-07-24   0.0        0.0          Dry          Dry  Dry_Season
3   1983-07-25   0.0        0.0          Dry          Dry  Dry_Season
4   1983-07-26   0.0        0.0          Dry          Dry  Dry_Season
5   1983-07-27   0.0        0.0          Dry          Dry  Dry_Season
6   1983-07-28   0.0        0.0          Dry          Dry  Dry_Season
7   1983-07-29   0.0        0.0          Dry          Dry  Dry_Season
8   1983-07-30   0.0        0.0          Wet          Wet  Wet_Season
9   1983-07-31   0.0        0.0          Wet          Wet  Wet_Season
10  1983-08-01   0.0        0.0          Wet          Wet  Wet_Season
11  1983-08-02   0.0        0.0          Wet          Wet  Wet_Season
12  1983-08-03   0.0        0.0          Wet          Wet  Wet_Season
13  1983-08-04   0.0        0.0          Wet          Wet  Wet_Season
14  1983-08-05   0.0        0.0          Wet          Wet  Wet_Season
15  1983-08-06   0.0        0.0          Wet          Wet  Wet_Season

Otherwise, you need to interpolate with nearest; however that only works for numeric types. So we need to transform each column, interpolate, and transform back.

str_cols = ['CONDITION_RR', 'CONDITION_PR', 'SEASON']

d = {}  # Holds mapping from str values to integers
for col in str_cols:
    u = df[col].dropna().unique()
    d[col] = dict(zip(u, range(len(u))))
    df[col] = df[col].map(d[col])  # Map unique values to integers

df = df.apply(pd.Series.interpolate, method='nearest')

# Map back
for col in str_cols:
    rev_d = {v:k for k,v in d[col].items()}
    df[col] = df[col].map(rev_d)

Upvotes: 1

Related Questions