deathracer
deathracer

Reputation: 305

merge multiple timeseries of different length into a 2D matrix

I have an array of timestamps arrays where each timestamp array is of different length. For example,

[arr1, arr2, arr3....] 

arr1 = [0.24, 0.56, 0.77]
arr2 = [0.1, 0.24]
arr3 = [0.6, 0.7, 0.72, 0.88]

This is what the output should look like:

NaN, 0.24, 0.56, Nan, Nan,  Nan, 0.77, Nan
0.1, 0.24,  Nan, Nan, Nan,  Nan,  Nan, Nan
Nan,  Nan,  Nan, 0.6, 0.7, 0.72,  NaN, Nan

How do I go on to merge all of these arrays into a single 2D matrix? Another note is that each individual array (arr1, arr2, ..) are very large in size (tens of thousands).

I feel pandas merge function can be used but I don't know how to proceed with it.

Upvotes: 2

Views: 321

Answers (2)

G.G
G.G

Reputation: 765

arr1 = [0.24, 0.56, 0.77]
arr2 = [0.1, 0.24]
arr3 = [0.6, 0.7, 0.72, 0.88]
list2=[arr1, arr2, arr3]
ss1=pd.Series(pd.DataFrame(list2).to_numpy().reshape(-1)).dropna().drop_duplicates().sort_values()
pd.Series(list2).apply(lambda x:ss1.where(lambda ss:ss.isin(x)))

   4     0     1    8    9     10    2     11
0  NaN  0.24  0.56  NaN  NaN   NaN  0.77   NaN
1  0.1  0.24   NaN  NaN  NaN   NaN   NaN   NaN
2  NaN   NaN   NaN  0.6  0.7  0.72   NaN  0.88

Upvotes: -1

Quang Hoang
Quang Hoang

Reputation: 150805

Here's an approach with pandas:

arrs = [arr1,arr2,arr3]

# convert to series
series = [pd.Series(arr,index=arr) for arr in arrs]

# concat with reindex
pd.concat(series, axis=1)

Output:

         0     1     2
0.10   NaN  0.10   NaN
0.24  0.24  0.24   NaN
0.56  0.56   NaN   NaN
0.60   NaN   NaN  0.60
0.70   NaN   NaN  0.70
0.72   NaN   NaN  0.72
0.77  0.77   NaN   NaN
0.88   NaN   NaN  0.88

Upvotes: 2

Related Questions