Ulf Aslak
Ulf Aslak

Reputation: 8628

Simple way of stacking arrays with index offset

I have a number of time series, each containing measurements across weeks of the year, but not all of them start and end on the same weeks. I know the offsets, that is I know in what weeks each one starts and ends. Now I would like to combine them into a matrix respecting the inherent offsets, such that all values will align with the correct week numbers.

If the horizontal direction contains the series and vertical direction represents the weeks, given two series a and b, where values correspond to week numbers:

a = np.array([[1,2,3,4,5,6]])
b = np.array([[0,1,2,3,4,5]])

I want to know if is it possible to combine them, e.g. using some method that takes an offset argument in a fashion like combine((a, b), axis=0, offset=-1), such that the resulting array (lets call it c) looks like this:

print c
[[NaN 1   2   3   4   5   6  ]
 [0   1   2   3   4   5   NaN]]

What more is, since the time series are enormous, I must stream them through my program, and therefore cannot know all offsets at the same time. I thought of using Pandas because it has nice indexing, but I felt there had to be a simpler way, since the essence of what I'm trying to do is super simple.

Update: This seems to work

def offset_stack(a, b, offset=0):
    if offset < 0:
        a = np.insert(a, [0] * abs(offset), np.nan)
        b = np.append(b, [np.nan] * abs(offset))
    if offset > 0:
        a = np.append(a, [np.nan] * abs(offset))
        b = np.insert(b, [0] * abs(offset), np.nan)

    return np.concatenate(([a],[b]), axis=0)

Upvotes: 2

Views: 2362

Answers (3)

hpaulj
hpaulj

Reputation: 231625

pad and concatenate (and the various stack and inserts) create a target array of the right size, and fill values from the input arrays. So we can do the same, and potentially do it faster.

Just for example using your 2 arrays and the 1 step offset:

In [283]: a = np.array([[1,2,3,4,5,6]])
In [284]: b = np.array([[0,1,2,3,4,5]])

create the target array, and fill it with the pad value. np.nan is a float (even though a is int):

In [285]: m=a.shape[0]+b.shape[0]    
In [286]: n=a.shape[1]+1    
In [287]: c=np.zeros((m,n),float)
In [288]: c.fill(np.nan)

Now just copy values into the right places on the target. More arrays and offsets will require some generalization here.

In [289]: c[:a.shape[0],1:]=a
In [290]: c[-b.shape[0]:,:-1]=b

In [291]: c
Out[291]: 
array([[ nan,   1.,   2.,   3.,   4.,   5.,   6.],
       [  0.,   1.,   2.,   3.,   4.,   5.,  nan]])

Upvotes: 1

MSeifert
MSeifert

Reputation: 152775

There is a real simple way to accomplish this.

You basically want to pad and then stack your arrays and for both there are numpy functions:

numpy.lib.pad() aka offset

a = np.array([[1,2,3,4,5,6]], dtype=np.float_) # float because NaN is a float value!
b = np.array([[0,1,2,3,4,5]], dtype=np.float_)

from numpy.lib import pad
print(pad(a, ((0,0),(1,0)), mode='constant', constant_values=np.nan))
# [[ nan   1.   2.   3.   4.   5.   6.]]
print(pad(b, ((0,0),(0,1)), mode='constant', constant_values=np.nan))
# [[  0.,   1.,   2.,   3.,   4.,   5.,  nan]]

The ((0,0)(1,0)) means just no padding in the first axis (top/bottom) and only pad one element left and no element on the right. So you have to tweak these if you want more/less shift.

numpy.vstack() aka stack along axis=0

import numpy as np

a_padded = pad(a, ((0,0),(1,0)), mode='constant', constant_values=np.nan)
b_padded = pad(b, ((0,0),(0,1)), mode='constant', constant_values=np.nan)

np.vstack([a_padded, b_padded])
# array([[ nan,   1.,   2.,   3.,   4.,   5.,   6.],
#        [  0.,   1.,   2.,   3.,   4.,   5.,  nan]])

Your function:

Combining these two would be very easy and is easy to extend:

from numpy.lib import pad
import numpy as np

def offset_stack(a, b, axis=0, offsets=(0, 1)):
    if (len(offsets) != a.ndim) or (a.ndim != b.ndim):
        raise ValueError('Offsets and dimensions of the arrays do not match.')
    offset1 = [(0, -offset) if offset < 0 else (offset, 0) for offset in offsets]
    offset2 = [(-offset, 0) if offset < 0 else (0, offset) for offset in offsets]
    a_padded = pad(a, offset1, mode='constant', constant_values=np.nan)
    b_padded = pad(b, offset2, mode='constant', constant_values=np.nan)
    return np.concatenate([a_padded, b_padded], axis=axis)

offset_stack(a, b)

This function works for generalized offsets in arbitary dimensions and can stack in arbitary dimensions. It doesn't work in the same way as the original since you pad the second dimension just passing in offset=1 would pad in the first dimension. But if you keep track of the dimensions of your arrays it should work fine.

For example:

offset_stack(a, b, offsets=(1,2))
array([[ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,   1.,   2.,   3.,   4.,   5.,   6.],
       [  0.,   1.,   2.,   3.,   4.,   5.,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan]])

or for 3d arrays:

a = np.array([1,2,3], dtype=np.float_)[None, :, None] # makes it 3d
b = np.array([0,1,2], dtype=np.float_)[None, :, None] # makes it 3d

offset_stack(a, b, offsets=(0,1,0), axis=2)
array([[[ nan,   0.],
        [  1.,   1.],
        [  2.,   2.],
        [  3.,  nan]]])

Upvotes: 2

Colonel Beauvel
Colonel Beauvel

Reputation: 31181

You can do in numpy:

def f(a, b, n):
    v = np.empty(abs(n))*np.nan
    if np.sign(n)==-1:
        return np.vstack((np.append(a,v), np.append(v,b)))
    elif np.sign(n)==1:
        return np.vstack((np.append(v,a), np.append(b,v)))
    else:
        return np.vstack((a,b))

#In [148]: a = np.array([23, 13, 4, 12, 4, 4])

#In [149]: b = np.array([4, 12, 3, 41, 45, 6])

#In [150]: f(a,b,-2)
#Out[150]:
#array([[ 23.,  13.,   4.,  12.,   4.,   4.,  nan,  nan],
#       [ nan,  nan,   4.,  12.,   3.,  41.,  45.,   6.]])

#In [151]: f(a,b,2)
#Out[151]:
#array([[ nan,  nan,  23.,  13.,   4.,  12.,   4.,   4.],
#       [  4.,  12.,   3.,  41.,  45.,   6.,  nan,  nan]])

#In [152]: f(a,b,0)
#Out[152]:
#array([[23, 13,  4, 12,  4,  4],
#       [ 4, 12,  3, 41, 45,  6]])

Upvotes: 2

Related Questions