Efficient and elegant merge of time series data in numpy

Question

I have two (or more but two is enough to solve the core problem) time series in the form of a list of pairs (time, value):

Series x
time    0  2  4  5  8  9
value  x0 x2 x4 x5 x8 x9

Series y
time    0  2  3  5  7  8
value  y0 y2 y3 y5 y7 y8

What I want to get is a single time series of tuples from the two original series in such a way that where the times align, the values at that time are used and where they do not, the value from the last time that was available is used:

time     0  2  3  4  5  7  8  9
x vals  x0 x2 x2 x4 x5 x5 x8 x9
y vals  y0 y2 y3 y3 y5 y7 y8 y8

In my concrete environment the two time series are 2-D numpy arrays where in one row holds the times and the other row holds the values. Also it can be assumed that all the series start at the same time (e.g. 0) so they always align at the start.

This can, of course, be solved "manually" by a loop with "pointers" to the current position in the series and moving the pointers. However, is there a simple and elegant way of doing this in numpy?

Quang Hoang · Accepted Answer

You do want pandas for this:

df1 = pd.DataFrame({'time':[0,2,4,5,8,9],
                    'value':['x0','x2','x4','x5','x8','x9']})


df2 = pd.DataFrame({'time':[0,2,3,5,7,8],
                    'value':['y0','y2','y3','y5','y7','y8']})

df1.merge(df2, on='time', how='outer').sort_values('time').ffill()

Output:

   time value_x value_y
0     0      x0      y0
1     2      x2      y2
6     3      x2      y3
2     4      x4      y3
3     5      x5      y5
7     7      x5      y7
4     8      x8      y8
5     9      x9      y8

Efficient and elegant merge of time series data in numpy

Answers (1)

Related Questions