Reputation: 3993
I have a 4D numpy array of input data where each column represents a quantity (say speed, acceleration, etc) and I would like to calculate some statistical information for each quantity (mean, st-dev. meadian, 75, 85 and 95 percentiles.
So for example:
input_shape = (1,200,4)
n_sample = 100
X = np.random.uniform(0,1, (n_sample,) + input_shape)
X.shape
(100, 1, 200, 4)
X[0]
array([[[0.50410922, 0.82829892, 0.72460878, 0.0562701 ],
[0.49223423, 0.14152948, 0.32285973, 0.49056405],
...
[0.8299407 , 0.78446729, 0.40959698, 0.893117 ],
[0.25150705, 0.56759064, 0.28280459, 0.0599566 ]]])
Each column of X
represents some physical quantity for 200 data-points. The statistics of each quantity is what I'm interested in.
EDIT
I would expect something like:
[[[col1_mean, col2_mean, col3_mean, col4_mean ],
[col1_std, col2_std, col3_std, col4_mean],
[col1_med, col2_med, col3_med, col4_med],
[col1_p75, col2_p75, col3_p75, col4_p75 ],
[col1_p85, col2_p85, col3_p85, col4_p85 ],
[col1_p95, col2_p95, col3_p95, col4_p95 ]]]
So the result is shaped (100, 1, 6, 4)
Upvotes: 0
Views: 598
Reputation: 6440
The easiest thing would be to compute the statistics of interest by supplying an axis
argument. This is used by many NumPy functions to run their computation along that axis. For your data, it seems you'd like to compute across the "data points" dimension, which is axis=2
. For example:
>>> input_shape = (1,200,4)
>>> n_sample = 100
>>> X = np.random.uniform(0,1, (n_sample,) + input_shape)
>>> X.shape
(100, 1, 200, 4)
>>> X.mean(axis=2).shape # Compute mean along 3rd axis
(100, 1, 4)
>>> stat_functions = (np.mean, np.std, np.med)
>>> stats = [func(X, axis=2) for func in stat_functions]
>>> list(map(np.shape, stats))
[(100, 1, 4), (100, 1, 4), (100, 1, 4)]
You'll have to do a bit more work to create functions to compute the percentiles you're interested in:
>>> import functools
>>> percentiles = tuple(functools.partial(np.percentile, q=q) for q in (75, 85, 95))
>>> stat_functions = (np.mean, np.std, np.median) + percentiles
If you want to join these into a single array, you can use the keepdims
kwarg of each to avoid removing the axis along which the function is applied, and then concatenate the results:
>>> stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)
>>> stats.shape
(100, 1, 6, 4)
Upvotes: 2
Reputation: 87
you can do it with a cicle on indexes, for example if you try this:
print(X[0][0][:,0])
it prints first column so you can iterate it and append it to a list, then calulate median and sdv.
Upvotes: 0