i.petruk
i.petruk

Reputation: 1316

Filling blanks after merge in MultiIndex timeseries

I have multiple MultiIndex timeseries that are merged together and the index is ("location", "timestamp"). Example:

location   timestamp     x     y   z
       A           1   1.2   4.3  NaN
       A           2   2.2   5.3  NaN
       A           3   NaN   NaN  1.0
       A           4   3.2   7.3  NaN
...
       B           1   2.2   4.3  NaN
       B           2   3.2   5.3  NaN
       B           3   NaN   NaN  2.0
       B           4   5.2   7.3  NaN
...

"timestamp" is a real datetime column, I am just using integers here for simplicity. I am trying to fill up missing data so each row has full set of numbers. "x", "y" and "z" as smooth unrelated to each other dataseries, so missing values can be derived vertically either linearly or at least by repetition of last known number.

I have tried different groupby/resample expressions and failed. The only thing left is to disassemble everything into Series and perform sampling on them. There has to be a better way

Upvotes: 0

Views: 109

Answers (1)

Zero
Zero

Reputation: 76917

Here are couple of ways to do it

You could ffill() forward fill previous value.

In [56]: df.ffill()
Out[56]:
  location  timestamp    x    y   z
0        A          1  1.2  4.3 NaN
1        A          2  2.2  5.3 NaN
2        A          3  2.2  5.3   1
3        A          4  3.2  7.3   1
4        B          1  2.2  4.3   1
5        B          2  3.2  5.3   1
6        B          3  3.2  5.3   2
7        B          4  5.2  7.3   2

However, if something starts with NaN, it stays. So fillna(0) with zero then.

In [57]: df.ffill().fillna(0)
Out[57]:
  location  timestamp    x    y  z
0        A          1  1.2  4.3  0
1        A          2  2.2  5.3  0
2        A          3  2.2  5.3  1
3        A          4  3.2  7.3  1
4        B          1  2.2  4.3  1
5        B          2  3.2  5.3  1
6        B          3  3.2  5.3  2
7        B          4  5.2  7.3  2

Instead of ffill(), you could also, interpolate the values, using pd.Series.interpolate

In [58]: df.apply(pd.Series.interpolate).fillna(0)
Out[58]:
  location  timestamp    x    y     z
0        A          1  1.2  4.3  0.00
1        A          2  2.2  5.3  0.00
2        A          3  2.7  6.3  1.00
3        A          4  3.2  7.3  1.25
4        B          1  2.2  4.3  1.50
5        B          2  3.2  5.3  1.75
6        B          3  4.2  6.3  2.00
7        B          4  5.2  7.3  2.00

And, instead of fillna(0), you could do the reverse of ffill() with bfill()

In [59]: df.apply(pd.Series.interpolate).bfill()
Out[59]:
  location  timestamp    x    y     z
0        A          1  1.2  4.3  1.00
1        A          2  2.2  5.3  1.00
2        A          3  2.7  6.3  1.00
3        A          4  3.2  7.3  1.25
4        B          1  2.2  4.3  1.50
5        B          2  3.2  5.3  1.75
6        B          3  4.2  6.3  2.00
7        B          4  5.2  7.3  2.00

Upvotes: 1

Related Questions