Reputation: 1316
I have multiple MultiIndex timeseries that are merged together and the index is ("location", "timestamp"). Example:
location timestamp x y z
A 1 1.2 4.3 NaN
A 2 2.2 5.3 NaN
A 3 NaN NaN 1.0
A 4 3.2 7.3 NaN
...
B 1 2.2 4.3 NaN
B 2 3.2 5.3 NaN
B 3 NaN NaN 2.0
B 4 5.2 7.3 NaN
...
"timestamp" is a real datetime column, I am just using integers here for simplicity. I am trying to fill up missing data so each row has full set of numbers. "x", "y" and "z" as smooth unrelated to each other dataseries, so missing values can be derived vertically either linearly or at least by repetition of last known number.
I have tried different groupby/resample expressions and failed. The only thing left is to disassemble everything into Series and perform sampling on them. There has to be a better way
Upvotes: 0
Views: 109
Reputation: 76917
Here are couple of ways to do it
You could ffill()
forward fill previous value.
In [56]: df.ffill()
Out[56]:
location timestamp x y z
0 A 1 1.2 4.3 NaN
1 A 2 2.2 5.3 NaN
2 A 3 2.2 5.3 1
3 A 4 3.2 7.3 1
4 B 1 2.2 4.3 1
5 B 2 3.2 5.3 1
6 B 3 3.2 5.3 2
7 B 4 5.2 7.3 2
However, if something starts with NaN
, it stays. So fillna(0)
with zero then.
In [57]: df.ffill().fillna(0)
Out[57]:
location timestamp x y z
0 A 1 1.2 4.3 0
1 A 2 2.2 5.3 0
2 A 3 2.2 5.3 1
3 A 4 3.2 7.3 1
4 B 1 2.2 4.3 1
5 B 2 3.2 5.3 1
6 B 3 3.2 5.3 2
7 B 4 5.2 7.3 2
Instead of ffill()
, you could also, interpolate the values, using pd.Series.interpolate
In [58]: df.apply(pd.Series.interpolate).fillna(0)
Out[58]:
location timestamp x y z
0 A 1 1.2 4.3 0.00
1 A 2 2.2 5.3 0.00
2 A 3 2.7 6.3 1.00
3 A 4 3.2 7.3 1.25
4 B 1 2.2 4.3 1.50
5 B 2 3.2 5.3 1.75
6 B 3 4.2 6.3 2.00
7 B 4 5.2 7.3 2.00
And, instead of fillna(0)
, you could do the reverse of ffill()
with bfill()
In [59]: df.apply(pd.Series.interpolate).bfill()
Out[59]:
location timestamp x y z
0 A 1 1.2 4.3 1.00
1 A 2 2.2 5.3 1.00
2 A 3 2.7 6.3 1.00
3 A 4 3.2 7.3 1.25
4 B 1 2.2 4.3 1.50
5 B 2 3.2 5.3 1.75
6 B 3 4.2 6.3 2.00
7 B 4 5.2 7.3 2.00
Upvotes: 1