Mac
Mac

Reputation: 123

Finding the difference between two rows, over specific columns

I'm wondering how I would find the difference between a number of columns in a pandas dataframe, while keeping other columns intact.

So if I have DataFrame, DF, I would want to find the difference between columns (val1, val2, val3), while retaining month and year. User type is not important, and can be removed.

df = pd.DataFrame({'mo': ['6', '6'],
...:               'yr': ['2017', '2017'],
...:               'user_type': ['a', 'b'],
...:               'val1': ['1', '10'],
...:               'val2': ['2', '20'],
...:               'val3': ['3', '30']},
...:                 index=[0, 1])

#### DF ####
| index | mo | yr   | user_type | val1 | val2 | val3 |
|-------|----|------|-----------|------|------|------|
| 0     | 6  | 2017 | a         | 1    | 2    | 3    |
| 1     | 6  | 2017 | b         | 10   | 20   | 30   |

Is this possible? I attempted to use df.diff(), which works if my dataframe contains only the three value columns, but not if I have the month and year columns.

Ideally, something like this would be my output

| index | mo | yr   | val1 | val2 | val3 |
|-------|----|------|------|------|------|
| 0     | 6  | 2017 | 9    | 18   | 27   |

Any help is greatly appreciated.

Upvotes: 0

Views: 53

Answers (1)

moys
moys

Reputation: 8033

df.groupby(['mo','yr'])['val1','val2','val3'].apply(lambda x : x.iloc[1]-x.iloc[0]).reset_index()

Output

    mo  yr    val1  val2    val3
0   6   2017    9   18       27

Upvotes: 2

Related Questions