Reputation: 3689
I'm trying to rework much of my analysis code for signal processing using Dataframes instead of numpy arrays. However, I'm having a hard time figuring out how to pass the entire matrix of a dataframe to a function as an entire unit.
E.g., If I'm computing the common average reference a signal, I have something like:
avg = signal.mean(axis=1)
CAR = signal - avg
What I'd like to do is pass a pandas array to this function and have it return a dataframe with CAR as the values now. I'd like to do this without just returning an array and then re-converting it back into a dataframe.
It sounds like when you use df.apply(), it goes row-wise or column-wise, and doesn't put in the whole matrix. I could alter the code of CAR to make this work, but it seems like it would slow it down quite a bit rather than just using numpy's code to do it all at once. It probably wouldn't make a big difference for computing the mean, but I foresee this being a problem with other functions in the future that might take longer.
Can anyone point me in the right direction?
EDIT: To clarify, I'm not just doing this for subtracting the mean, it was just a simple example. A more realistic example would be linearly filtering the array along axis 0. I'd like to use the scipy.signal filtfilt function to filter my array. This is quite easy if I can just pass it a tpts x feats matrix, but right now it seems that the only way to do it is column-wise using "apply"
Upvotes: 0
Views: 5432
Reputation: 622
http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.apply.html this will allow you to perform operations on a row (or column, or the entire dataframe).
import random
signal=pd.DataFrame([[10*random.random() for _ in range(3)]for _ in range(5)])
def testm(frame, average=0):
return frame-average
signal.apply(testm,average=signal.mean(),axis=1)
results:
signal
Out[57]:
0 1 2
0 5.566445 7.612070 8.554966
1 0.869158 2.382429 6.197272
2 5.933192 3.564527 9.805669
3 9.676292 1.707944 2.731479
4 5.319629 3.348337 6.476631
signal.mean()
Out[59]:
0 5.472943
1 3.723062
2 6.753203
dtype: float64
signal.apply(testm,average=signal.mean(),axis=1)
Out[58]:
0 1 2
0 0.093502 3.889008 1.801763
1 -4.603785 -1.340632 -0.555932
2 0.460249 -0.158534 3.052466
3 4.203349 -2.015117 -4.021724
4 -0.153314 -0.374724 -0.276572
This will take the mean of each column, and subtract it from each value in that column.
Upvotes: -1
Reputation: 251365
You can get the raw numpy array version of a DataFrame with df.values
. However, in many cases you can just pass the DataFrame itself, since it still allows use of the normal numpy API (i.e., it has all the right methods).
Upvotes: 3