Reputation: 123
I'm wondering how I would find the difference between a number of columns in a pandas dataframe, while keeping other columns intact.
So if I have DataFrame, DF, I would want to find the difference between columns (val1, val2, val3), while retaining month and year. User type is not important, and can be removed.
df = pd.DataFrame({'mo': ['6', '6'],
...: 'yr': ['2017', '2017'],
...: 'user_type': ['a', 'b'],
...: 'val1': ['1', '10'],
...: 'val2': ['2', '20'],
...: 'val3': ['3', '30']},
...: index=[0, 1])
#### DF ####
| index | mo | yr | user_type | val1 | val2 | val3 |
|-------|----|------|-----------|------|------|------|
| 0 | 6 | 2017 | a | 1 | 2 | 3 |
| 1 | 6 | 2017 | b | 10 | 20 | 30 |
Is this possible? I attempted to use df.diff()
, which works if my dataframe contains only the three value columns, but not if I have the month and year columns.
Ideally, something like this would be my output
| index | mo | yr | val1 | val2 | val3 |
|-------|----|------|------|------|------|
| 0 | 6 | 2017 | 9 | 18 | 27 |
Any help is greatly appreciated.
Upvotes: 0
Views: 53
Reputation: 8033
df.groupby(['mo','yr'])['val1','val2','val3'].apply(lambda x : x.iloc[1]-x.iloc[0]).reset_index()
Output
mo yr val1 val2 val3
0 6 2017 9 18 27
Upvotes: 2