arilwan
arilwan

Reputation: 3993

Calculating some statistics for each column of a numpy ndarray

I have a 4D numpy array of input data where each column represents a quantity (say speed, acceleration, etc) and I would like to calculate some statistical information for each quantity (mean, st-dev. meadian, 75, 85 and 95 percentiles.

So for example:

input_shape = (1,200,4)
n_sample = 100

X = np.random.uniform(0,1, (n_sample,) + input_shape)
X.shape
(100, 1, 200, 4)

X[0]
array([[[0.50410922, 0.82829892, 0.72460878, 0.0562701 ],
        [0.49223423, 0.14152948, 0.32285973, 0.49056405],
        ...
        [0.8299407 , 0.78446729, 0.40959698, 0.893117  ],
        [0.25150705, 0.56759064, 0.28280459, 0.0599566 ]]])

Each column of X represents some physical quantity for 200 data-points. The statistics of each quantity is what I'm interested in.

EDIT

I would expect something like:

[[[col1_mean, col2_mean, col3_mean, col4_mean ],
   [col1_std, col2_std, col3_std, col4_mean],
   [col1_med, col2_med, col3_med, col4_med],
   [col1_p75, col2_p75, col3_p75, col4_p75 ],
   [col1_p85, col2_p85, col3_p85, col4_p85 ],
   [col1_p95, col2_p95, col3_p95, col4_p95 ]]]

So the result is shaped (100, 1, 6, 4)

Upvotes: 0

Views: 598

Answers (2)

bnaecker
bnaecker

Reputation: 6440

The easiest thing would be to compute the statistics of interest by supplying an axis argument. This is used by many NumPy functions to run their computation along that axis. For your data, it seems you'd like to compute across the "data points" dimension, which is axis=2. For example:

>>> input_shape = (1,200,4)
>>> n_sample = 100
>>> X = np.random.uniform(0,1, (n_sample,) + input_shape)
>>> X.shape
(100, 1, 200, 4)
>>> X.mean(axis=2).shape  # Compute mean along 3rd axis
(100, 1, 4)
>>> stat_functions = (np.mean, np.std, np.med)
>>> stats = [func(X, axis=2) for func in stat_functions]
>>> list(map(np.shape, stats))
[(100, 1, 4), (100, 1, 4), (100, 1, 4)]

You'll have to do a bit more work to create functions to compute the percentiles you're interested in:

>>> import functools
>>> percentiles = tuple(functools.partial(np.percentile, q=q) for q in (75, 85, 95))
>>> stat_functions = (np.mean, np.std, np.median) + percentiles

If you want to join these into a single array, you can use the keepdims kwarg of each to avoid removing the axis along which the function is applied, and then concatenate the results:

>>> stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)
>>> stats.shape
(100, 1, 6, 4)

Upvotes: 2

you can do it with a cicle on indexes, for example if you try this:

print(X[0][0][:,0])

it prints first column so you can iterate it and append it to a list, then calulate median and sdv.

Upvotes: 0

Related Questions