Reputation: 4206

Binning a numpy array

I have a numpy array which contains time series data. I want to bin that array into equal partitions of a given length (it is fine to drop the last partition if it is not the same size) and then calculate the mean of each of those bins.

I suspect there is numpy, scipy, or pandas functionality to do this.

example:

data = [4,2,5,6,7,5,4,3,5,7]

for a bin size of 2:

bin_data = [(4,2),(5,6),(7,5),(4,3),(5,7)]
bin_data_mean = [3,5.5,6,3.5,6]

for a bin size of 3:

bin_data = [(4,2,5),(6,7,5),(4,3,5)]
bin_data_mean = [7.67,6,4]

Upvotes: 15

Answers (4)

Alexandre Kempf

Reputation: 989

I just wrote a function to apply it to all array size or dimension you want.

data is your array
axis is the axis you want to been
binstep is the number of points between each bin (allow overlapping bins)
binsize is the size of each bin

func is the function you want to apply to the bin (np.max for maxpooling, np.mean for an average ...)

def binArray(data, axis, binstep, binsize, func=np.nanmean):
    data = np.array(data)
    dims = np.array(data.shape)
    argdims = np.arange(data.ndim)
    argdims[0], argdims[axis]= argdims[axis], argdims[0]
    data = data.transpose(argdims)
    data = [func(np.take(data,np.arange(int(i*binstep),int(i*binstep+binsize)),0),0) for i in np.arange(dims[axis]//binstep)]
    data = np.array(data).transpose(argdims)
    return data

In you case it will be :

data = [4,2,5,6,7,5,4,3,5,7]
bin_data_mean = binArray(data, 0, 2, 2, np.mean)

or for the bin size of 3:

bin_data_mean = binArray(data, 0, 3, 3, np.mean)

Upvotes: 5

Óscar López

Reputation: 236004

Try this, using standard Python (NumPy isn't necessary for this). Assuming Python 2.x is in use:

data = [ 4, 2, 5, 6, 7, 5, 4, 3, 5, 7 ]

# example: for n == 2
n=2
partitions = [data[i:i+n] for i in xrange(0, len(data), n)]
partitions = partitions if len(partitions[-1]) == n else partitions[:-1]

# the above produces a list of lists
partitions
=> [[4, 2], [5, 6], [7, 5], [4, 3], [5, 7]]

# now the mean
[sum(x)/float(n) for x in partitions]
=> [3.0, 5.5, 6.0, 3.5, 6.0]

Upvotes: 5

Joe Kington

Reputation: 284612

Just use reshape and then mean(axis=1).

As the simplest possible example:

import numpy as np

data = np.array([4,2,5,6,7,5,4,3,5,7])

print data.reshape(-1, 2).mean(axis=1)

More generally, we'd need to do something like this to drop the last bin when it's not an even multiple:

import numpy as np

width=3
data = np.array([4,2,5,6,7,5,4,3,5,7])

result = data[:(data.size // width) * width].reshape(-1, width).mean(axis=1)

print result

Upvotes: 25

TomAugspurger

Reputation: 28946

Since you already have a numpy array, to avoid for loops, you can use reshape and consider the new dimension to be the bin:

In [33]: data.reshape(2, -1)
Out[33]: 
array([[4, 2, 5, 6, 7],
       [5, 4, 3, 5, 7]])

In [34]: data.reshape(2, -1).mean(0)
Out[34]: array([ 4.5,  3. ,  4. ,  5.5,  7. ])

Actually this will just work if the size of data is divisible by n. I'll edit a fix.

Looks like Joe Kington has an answer that handles that.

Upvotes: 6

Binning a numpy array

Answers (4)

Related Questions