Reputation: 6328

Calculating the mean trajectory from many trajectories in numpy

I have many trajectory files, each of them is having 3 columns denoting position x, y and z. I want to calculate the mean position, which is basically defined as following - for a given row, calculate the mean of x from all trajectories. Similarly for y and z dimension also.

So, I am iterating over each of these array and storing all x in one list and similarly for y and z. Later I am calculating the mean. See below the sample code-

import numpy as np
import pandas as pd

file_list = ['test1_1', 'test2_4', 'test3_1', 'test4_3', 'test1_3']
position_data_list = []
for f in file_list:
    position_data = pd.read_csv(f) 
    position_data_list.append(position_data.values)

position_x_list = []
position_y_list = []
position_z_list = []
for position_data in position_data_list:
    px = _position_data[:, 0]
    py = _position_data[:, 1]
    pz = _position_data[:, 2]
    position_x_list.append(px)
    position_y_list.append(py)
    position_z_list.append(pz)

position_x_list = np.array(position_x_list).T
position_y_list = np.array(position_y_list).T
position_z_list = np.array(position_z_list).T

position_x_mean = np.mean(position_x_list, axis=1)
position_y_mean = np.mean(position_y_list, axis=1)
position_z_mean = np.mean(position_z_list, axis=1)

Is there any better way to do the same?

Let me explain the above code. Suppose files are file_1, file_2 and file_3. Each file has x, y, and z column, where each row is time stamp say t1, t2, t3, t4 and t5. The mean trajectory should contain all the rows from t1 to t5, where x1 is mean of x from file file_1, file_2 and file_3 of row t1. and so on...

Upvotes: 0

Answers (3)

Longwen Ou

Reputation: 879

Actually pandas is very powerful and can do much more than just reading data. You've already read the data into a pandas dataframe, then you can just concatenate your dataframes and calculate the mean of each column with pandas. If you are trying to calculate the mean for each time stamp, you can try the groupby funciton. Assuming the column name for your time stamp is "ts", then try the following:

import pandas as pd
file_list = ['test1_1', 'test2_4', 'test3_1', 'test4_3', 'test1_3']
df = pd.DataFrame()             # Create an empty dataframe
for file in file_list:
    df2 = pd.read_csv(file)     # Read data and store the results in df2
    df = pd.concat([df, df2])   # Concatenate your dataframes and store the results in df
print(df.groupby('ts').mean())  # Assuming 'ts' is the column of time stamp, print the results

Input:

file1: 

ts  x   y   z
t1  1   3   5
t2  2   4   6
t3  3   5   7
t4  4   6   8
t5  5   7   9

file2:

ts  x   y   z
t1  1   4   5
t2  2   6   6
t3  3   8   7
t4  4   10  8
t5  5   12  9

output:

      x    y    z
ts               
t1  1.0  3.5  5.0
t2  2.0  5.0  6.0
t3  3.0  6.5  7.0
t4  4.0  8.0  8.0
t5  5.0  9.5  9.0

Upvotes: 2

Jonathan

Reputation: 76

So you want to average each coordinates over the frames and you can have all your frames as arrays in memory. Then you can have your all trajectory as a single array where one dimension represent the frames, an other one the moving elements (your current lines), and the last dimension represents the axis (your current columns). Assuming your dimensions are in that order, then you want the mean of that array over the first dimension: you can use my_array.mean(axis=0).

I got the same result on a test system with the following code as with your example:

file_list = glob('csv_frames/*')

position_data_list = []
for frame in file_list:
    position_data_list.append(numpy.loadtxt(frame, delimiter=','))
# Convert the list of arrays into a 3D array
position_data_list = numpy.asarray(position_data_list)

# Actually calculate the averaged coordinates
position_mean = position_data_list.mean(axis=0)

# If realy you need each axis on its own array
position_x_mean = position_mean[:, 0]
position_y_mean = position_mean[:, 1]
position_z_mean = position_mean[:, 2]

In my example, I use numpy.loadtxt to read the CSV file. Depending on your files you may have to tweak the arguments. You can also use pandas to read the file and extract an array from your DataFrame using the as_matrix method.

I built my test frames from a molecular dynamics simulation trajectory using MDAnalysis:

import numpy
import MDAnalysis as mda
from MDAnalysisTests.datafiles import TPR, XTC

# Read the trajectory
u = mda.Universe(TPR, XTC)
# Write each frame in a separate CSV file
for ts in u.trajectory:
    numpy.savetxt('csv_frames/frame_{}.csv'.format(ts.frame),
                  u.atoms.positions, delimiter=',')

Upvotes: 0

Shijo

Reputation: 9711

import pandas as pd
import glob, os


file_list = ['test1_1', 'test2_4', 'test3_1', 'test4_3', 'test1_3']
position_data_list =  pd.DataFrame()
for f in file_list:
    position_data_list =position_data_list.append(pd.read_csv(tfile))

position_data_list.columns=['X','Y','Z']
print position_data_list["Y"].mean()
print position_data_list["X"].mean()
print position_data_list["Z"].mean()

input

5.742023, 0.193241, 2.874091
8.742023, 0.35, 2.78
23, 0.55, 2.89
7.742023, 0.65, .8274091

output

0.516666666667
13.1613486667
2.16580303333

Upvotes: 0

Calculating the mean trajectory from many trajectories in numpy

Answers (3)

Related Questions