Filling blanks after merge in MultiIndex timeseries

Question

I have multiple MultiIndex timeseries that are merged together and the index is ("location", "timestamp"). Example:

location   timestamp     x     y   z
       A           1   1.2   4.3  NaN
       A           2   2.2   5.3  NaN
       A           3   NaN   NaN  1.0
       A           4   3.2   7.3  NaN
...
       B           1   2.2   4.3  NaN
       B           2   3.2   5.3  NaN
       B           3   NaN   NaN  2.0
       B           4   5.2   7.3  NaN
...

"timestamp" is a real datetime column, I am just using integers here for simplicity. I am trying to fill up missing data so each row has full set of numbers. "x", "y" and "z" as smooth unrelated to each other dataseries, so missing values can be derived vertically either linearly or at least by repetition of last known number.

I have tried different groupby/resample expressions and failed. The only thing left is to disassemble everything into Series and perform sampling on them. There has to be a better way

Zero · Accepted Answer

Here are couple of ways to do it

You could ffill() forward fill previous value.

In [56]: df.ffill()
Out[56]:
  location  timestamp    x    y   z
0        A          1  1.2  4.3 NaN
1        A          2  2.2  5.3 NaN
2        A          3  2.2  5.3   1
3        A          4  3.2  7.3   1
4        B          1  2.2  4.3   1
5        B          2  3.2  5.3   1
6        B          3  3.2  5.3   2
7        B          4  5.2  7.3   2

However, if something starts with NaN, it stays. So fillna(0) with zero then.

In [57]: df.ffill().fillna(0)
Out[57]:
  location  timestamp    x    y  z
0        A          1  1.2  4.3  0
1        A          2  2.2  5.3  0
2        A          3  2.2  5.3  1
3        A          4  3.2  7.3  1
4        B          1  2.2  4.3  1
5        B          2  3.2  5.3  1
6        B          3  3.2  5.3  2
7        B          4  5.2  7.3  2

Instead of ffill(), you could also, interpolate the values, using pd.Series.interpolate

In [58]: df.apply(pd.Series.interpolate).fillna(0)
Out[58]:
  location  timestamp    x    y     z
0        A          1  1.2  4.3  0.00
1        A          2  2.2  5.3  0.00
2        A          3  2.7  6.3  1.00
3        A          4  3.2  7.3  1.25
4        B          1  2.2  4.3  1.50
5        B          2  3.2  5.3  1.75
6        B          3  4.2  6.3  2.00
7        B          4  5.2  7.3  2.00

And, instead of fillna(0), you could do the reverse of ffill() with bfill()

In [59]: df.apply(pd.Series.interpolate).bfill()
Out[59]:
  location  timestamp    x    y     z
0        A          1  1.2  4.3  1.00
1        A          2  2.2  5.3  1.00
2        A          3  2.7  6.3  1.00
3        A          4  3.2  7.3  1.25
4        B          1  2.2  4.3  1.50
5        B          2  3.2  5.3  1.75
6        B          3  4.2  6.3  2.00
7        B          4  5.2  7.3  2.00

Filling blanks after merge in MultiIndex timeseries

Answers (1)

Related Questions