nico
nico

Reputation: 386

Average multiple vectors of points of different lengths in python

I have a list of trajectories for different trials that themselves are a list of points...

trajectories = [[(x,y),(x,y), ...], [(x,y), ...], ...]

the number of points vary from trial to trial.

The final goal is to plot average trajectory +/- SEM (Standard Error of Mean) across trials.

As far as I can understand I should get the longest trajectory and for each of the remaining trajectories add 'resolution' to the other vectors so they are the same length, so something like this:

#find the maximum length
max_len = len(trajectories[0])
longest = []
for t in trajectories:
    if len(t) > max_len:
        max_len = len(t)
        longest = t
# now transform the other vectors I assume using the longest vector or the length of this vector 
newTrajectories = []
for i,t in enumerate(trajectories):
    newTrajectories[i] = resample(t, longest or max_len, or something similar!!)

Is there a function that given a vec of tuples (x,y) of len X and another of len Y where X>Y adds points (x,y) to the Y vec in the right places like using the average of previous and following point or the median?

Edit: The simplest example I can think of is using 2 vectors of trajectories:

vec_one = [(2,4),(3,5),(1,6)]
vec_two = [(2,4), (1,6)]

Both of them start from x=2, y=4 and end up in x=1, y=6 vec_one however is longer (it took more time). I figure that to be able to average across trajectories, vec_two needs to be longer so I would need to extrapolate, in some way, the values of the x,y position that is missing.

I've been looking at splprep, splrep and splev of the scypi.interpolate module, but I'm afraid I don't quite understand them yet.

Edit2: Effectively I'm trying to abstract out time from an (x,y) time series. So the problem becomes where to introduce new values and by which criteria I select a 'site' for inserting values, the way I extrapolate the values seems now less impotant...

Upvotes: 0

Views: 2528

Answers (1)

nico
nico

Reputation: 386

Unfortunately no takers, here is a solution I find works ok.

I had to change the format of the data to solve this. So instead of having a list of trials with variable number of (x, y) points: [[(x, y), (x, y), ...], [(x, y), (...), ...]]

I now have 3 numpy.arrays:

sx = array([ 23, 34, 42, ..., 56, 56, 63])

sy = array([ 78, 94, 20, ..., 44, 38, 34])

st = array([1, 1, 1, ..., 293, 293, 293])

All vectors are the same length as they are essentially part of a table, where sx is the column with all the x positions, sy is all the y positions and st is the trial number (or the list ID of the x and y positions). st is basically a bunch of repeated numbers [1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,...]

(I'm actually using HDF5/pytables to store my data and its a direct read from the table that contains the tracking data)

This solution uses interp1d

from scipy.interpolate import interp1d

as well as numpy of course

import numpy as np

I admit its a hacked solution and not really fast but it works :) On the other hand re-reading my own question made me think it was not a very clear exposition of my problem... sorry for that. Anyway here is the solution.

The following func receives the 3 vecs I described above, a trialList, which is a list of trials to collapse across and a kind, which is the type of collapsing you want to do, can be mean or median for now. It will return the collapsed trajectory i.e. the x and y positions of the average or the median of the trialList

def collapseTrajectories(sx, sy, st, trialList, kind='median'):
    # find the longest trial to use as template
    l = 0
    tr = []
    for t in trialList:
        if len(st[st==t]) > l:
            l = len(st[st==t])
            tr = t

    # Make all vectors the same length by interpolating the values
    xnew = np.linspace(0, 640, l)
    ynew = np.linspace(0, 480, l)
    sx_new = []
    sy_new = []

    for t in trialList:
        if len(st[st==t]) > 3:
            X = sx[st==t]
            Y = sy[st==t]
            x = np.linspace(0,640, len(X))
            y = np.linspace(0,480,len(Y))
            fx = interp1d(x, X, kind='cubic')
            fy = interp1d(y, Y, kind='cubic')
            sx_new.append(fx(xnew))
            sy_new.append(fy(ynew))

    # Collapse using the appropriate kind
    if kind == 'median':
        out_x = np.median(sx_new, axis=0)
        out_y = np.median(sy_new, axis=0)
    elif kind=='mean':
        out_x = np.mean(sx_new, axis=0)
        out_y = np.mean(sy_new, axis=0)

    return out_x, out_y

Upvotes: 1

Related Questions