Reputation: 17017
I have a pandas dataframe:
a = pd.DataFrame(rand(5,6)*10, index=pd.DatetimeIndex(start='2005', periods=5, freq='A'))
a.columns = pd.MultiIndex.from_product([('A','B'),('a','b','c')])
I want to subtract the row a['2005']
from a
. To do that I've tried this:
In [22]:
a - a.ix['2005']
Out[22]:
A B
a b c a b c
2005-12-31 0 0 0 0 0 0
2006-12-31 NaN NaN NaN NaN NaN NaN
2007-12-31 NaN NaN NaN NaN NaN NaN
2008-12-31 NaN NaN NaN NaN NaN NaN
2009-12-31 NaN NaN NaN NaN NaN NaN
Which obviously doesn't work because pandas is lining up the index while doing the operation. This works:
In [24]:
pd.DataFrame(a.values - a['2005'].values, index=a.index, columns=a.columns)
Out[24]:
A B
a b c a b c
2005-12-31 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2006-12-31 -3.326761 -7.164628 8.188518 -0.863177 0.519587 -3.281982
2007-12-31 3.529531 -4.719756 8.444488 1.355366 7.468361 -4.023797
2008-12-31 3.139185 -8.420257 1.465101 -2.942519 1.219060 -5.146019
2009-12-31 -3.459710 0.519435 -1.049617 -2.779370 4.792227 -1.922461
But I don't want to have to form a new DataFrame every time I have to do this kind of operation. I've tried the apply() method like this: a.apply(lambda x: x-a['2005'].values)
but I get ValueError: cannot copy sequence with size 6 to array axis with dimension 5
So I'm not really sure how to proceed. Is there a simple way to do this that I am not seeing? I think there should be an easy way to do this in place so you don't have to construct a new dataframe each time. I also tried the sub()
method but the subtraction is only applied to the first row whereas I want to subtract the first row from each row in the dataframe.
Upvotes: 9
Views: 20599
Reputation: 31
For timestamp values to calculate how much time passed with respect to the start time use:
df['Time_column'].apply(lambda x: x-df.iloc[[0],[1]])
Where df.iloc[[0],[1]]
= the start time
Upvotes: 3
Reputation: 12878
Here is a more verbose simple break down of how to do this.
First make a simple DataFrame to make it easier to understand.
import numpy as np
import pandas as pd
#make a simple DataFrame
df = pd.DataFrame(np.fromfunction(lambda i, j: i+1 , (3, 3), dtype=int))
Which will look like this
# 1 1 1
# 2 2 2
# 3 3 3
Now get the values from the first row
first_row = df.iloc[[0]].values[0]
Now use apply() to subtract the first row from the rest of the rows.
df.apply(lambda row: row - first_row, axis=1)
The result will look like this. See that 1 was subtracted from each row
# 0 0 0
# 1 1 1
# 2 2 2
Upvotes: 4
Reputation: 880547
Pandas is great for aligning by index. So when you want Pandas to ignore the index, you need to drop the index. You can do that by converting the DataFrame a.loc['2005']
to a 1-dimensional NumPy array:
In [56]: a - a.loc['2005'].values.squeeze()
Out[56]:
A B
a b c a b c
2005-12-31 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2006-12-31 0.325968 1.314776 -0.789328 -0.344669 -2.518857 7.361711
2007-12-31 0.084203 2.234445 -2.838454 -6.176795 -3.645513 8.955443
2008-12-31 3.798700 0.299529 1.303325 -2.770126 -1.284188 3.093806
2009-12-31 1.520930 2.660040 0.846996 -9.437851 -2.886603 6.705391
The squeeze
method converts the NumPy array, a.loc['2005']
, of shape to (1, 6)
to an array of shape (6,)
. This allows the array to be broadcasted (during the subtraction) as desired.
Upvotes: 13