Reputation: 13
I have a dataset where each time point is represented by a set of sparse x and y values. For data storage purposes, if y = 0, that data point is not recorded.
Imagine data point t0:
#Real data
#t0
x0 = [200, 201, 202, 203, 204, 205, 206, 207, ...]
y0 = [5, 10, 0, 7, 0, 0, 15, 20, ...]
#Data stored
#t0
x0 = [200, 201, 203, 206, 207, ...]
y0 = [5, 10, 7, 15, 20, ...]
Now, imagine I have data point t1:
#Data stored
#t1
x1 = [201, 204, 206, 207, ...]
y1 = [10, 15, 3, 20, ...]
Is there a simple and efficient way to rebuild the full dataset for a custom number of data points? Let's say I want a data structure that represents all data contained in t0 + t1:
#t0+t1
M = [[200, 201, 203, 204, 206, 207, ...], # this contains all xs recorded for both t0 and t1
[5, 10, 7, 0, 15, 20, ... ], # y values from t0. Missing values are filled with 0
[0, 10, 0, 15, 3, 20, ...] # y values from t1. Missing values are filled with 0
]
Any help would be really appreciated!
Upvotes: 1
Views: 51
Reputation: 5949
It looks like np.searchsorted
is what you are looking for:
m0 = np.unique(x0 + x1) #assuming x0 and x1 are lists
M = np.zeros((3, len(m0)), dtype=int)
M[0] = m0
M[1, np.searchsorted(m0, x0)] = y0
M[2, np.searchsorted(m0, x1)] = y1
>>> M
array([[200, 201, 203, 204, 206, 207],
[ 5, 10, 7, 0, 15, 20],
[ 0, 10, 0, 15, 3, 20]])
Upvotes: 2