choldgraf
choldgraf

Reputation: 3689

Pass entire contents of a dataframe to a function in Pandas

I'm trying to rework much of my analysis code for signal processing using Dataframes instead of numpy arrays. However, I'm having a hard time figuring out how to pass the entire matrix of a dataframe to a function as an entire unit.

E.g., If I'm computing the common average reference a signal, I have something like:

avg = signal.mean(axis=1)
CAR = signal - avg

What I'd like to do is pass a pandas array to this function and have it return a dataframe with CAR as the values now. I'd like to do this without just returning an array and then re-converting it back into a dataframe.

It sounds like when you use df.apply(), it goes row-wise or column-wise, and doesn't put in the whole matrix. I could alter the code of CAR to make this work, but it seems like it would slow it down quite a bit rather than just using numpy's code to do it all at once. It probably wouldn't make a big difference for computing the mean, but I foresee this being a problem with other functions in the future that might take longer.

Can anyone point me in the right direction?

EDIT: To clarify, I'm not just doing this for subtracting the mean, it was just a simple example. A more realistic example would be linearly filtering the array along axis 0. I'd like to use the scipy.signal filtfilt function to filter my array. This is quite easy if I can just pass it a tpts x feats matrix, but right now it seems that the only way to do it is column-wise using "apply"

Upvotes: 0

Views: 5432

Answers (2)

Garth5689
Garth5689

Reputation: 622

http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.apply.html this will allow you to perform operations on a row (or column, or the entire dataframe).

import random
signal=pd.DataFrame([[10*random.random() for _ in range(3)]for _ in range(5)])

def testm(frame, average=0):
    return frame-average  

signal.apply(testm,average=signal.mean(),axis=1)

results:

signal  

Out[57]: 
      0         1         2
0  5.566445  7.612070  8.554966
1  0.869158  2.382429  6.197272
2  5.933192  3.564527  9.805669
3  9.676292  1.707944  2.731479
4  5.319629  3.348337  6.476631

signal.mean()

Out[59]: 
0    5.472943
1    3.723062
2    6.753203
dtype: float64

signal.apply(testm,average=signal.mean(),axis=1)

Out[58]: 
          0         1         2
0  0.093502  3.889008  1.801763
1 -4.603785 -1.340632 -0.555932
2  0.460249 -0.158534  3.052466
3  4.203349 -2.015117 -4.021724
4 -0.153314 -0.374724 -0.276572

This will take the mean of each column, and subtract it from each value in that column.

Upvotes: -1

BrenBarn
BrenBarn

Reputation: 251365

You can get the raw numpy array version of a DataFrame with df.values. However, in many cases you can just pass the DataFrame itself, since it still allows use of the normal numpy API (i.e., it has all the right methods).

Upvotes: 3

Related Questions