Reputation: 1285
I have df
like this:
I need to calculate difference between X1, X2, X3, X4 and X5
for each company, but I know only how to calculate difference between whole column.
df['dX1'] = df['X1'].shift(-1) - df['X1']
df['dX2'] = df['X2'].shift(-1) - df['X2']
df['dX3'] = df['X3'].shift(-1) - df['X3']
...
It's bad approach because it's substracting X1
on line 74
with X1
on line 73
(and that's 2 different companies, that makes no sense).
My question is, how to calculate difference between row values but for every period and for every company. E.g.
(I can load csvs one by one and calculate diff for each company, then merge all into one list, but it will take 2 days because i have 700 csv files).
Upvotes: 6
Views: 7246
Reputation: 862581
I think you need DataFrameGroupBy.diff
:
cols = ['X1','X2','X3','X4','X5']
df1 = df.groupby('company')[cols].diff(-1)
Upvotes: 7