Reputation: 1227
I have some time series data, say:
# [ [time] [ data ] ]
a = [[0,1,2,3,4],['a','b','c','d','e']]
b = [[0,3,4]['f','g','h']]
and I would like an output with some filler value, lets say None for now:
a_new = [[0,1,2,3,4],['a','b','c','d','e']]
b_new = [[0,1,2,3,4],['f',None,None,'g','h']]
Is there a built in function in python/numpy to do this (or something like this)? Basically I would like to have all of my time vectors of equal size so I can calculate statistics (np.mean) and deal with the missing data accordingly.
Upvotes: 4
Views: 624
Reputation: 221614
Here's a vectorized NumPythonic approach -
def align_arrays(A):
time, data = A
time_new = np.arange(np.max(time)+1)
data_new = np.full(time_new.size, None, dtype=object)
data_new[np.in1d(time_new,time)] = data
return time_new, data_new
Sample runs -
In [113]: a = [[0,1,2,3,4],['a','b','c','d','e']]
In [114]: align_arrays(a)
Out[114]: (array([0, 1, 2, 3, 4]), array(['a', 'b', 'c', 'd', 'e'], dtype=object))
In [115]: b = [[0,3,4],['f','g','h']]
In [116]: align_arrays(b)
Out[116]: (array([0, 1, 2, 3, 4]),array(['f', None, None, 'g', 'h'],dtype=object))
Upvotes: 0
Reputation: 1991
smarx has a fine answer, but pandas was made exactly for things like this.
# your data
a = [[0,1,2,3,4],['a','b','c','d','e']]
b = [[0,3,4],['f','g','h']]
# make an empty DataFrame (can do this faster but I'm going slow so you see how it works)
df_a = pd.DataFrame()
df_a['time'] = a[0]
df_a['A'] = a[1]
df_a.set_index('time',inplace=True)
# same for b (a faster way this time)
df_b = pd.DataFrame({'B':b[1]}, index=b[0])
# now merge the two Series together (the NaNs are in the right place)
df = pd.merge(df_a, df_b, left_index=True, right_index=True, how='outer')
In [28]: df
Out[28]:
A B
0 a f
1 b NaN
2 c NaN
3 d g
4 e h
Now the fun is just beginning. Within a DataFrame you can
compute all of your summary statistics (e.g. df.mean()
)
make plots (e.g. df.plot()
)
slice/dice your data basically however you want (e.g df.groupby()
)
Fill in or drop missing data using a specified method (e.g. df.fillna()
),
take quarterly or monthly averages (e.g. df.resample()
) and a lot more.
If you're just getting started (sorry for the infomercial it you aren't), I recommend reading 10 minutes to pandas for a quick overview.
Upvotes: 1
Reputation: 60143
How about this? (I'm assuming your definition of b
was a typo, and I'm also assuming you know in advance how many entries you want.)
>>> b = [[0,3,4], ['f','g','h']]
>>> b_new = [list(range(5)), [None] * 5]
>>> for index, value in zip(*b): b_new[1][index] = value
>>> b_new
[[0, 1, 2, 3, 4], ['f', None, None, 'g', 'h']]
Upvotes: 4