ZspreDo
ZspreDo

Reputation: 1

Pandas - Create column with difference in values

I have the below dataset. How can create a new column that shows the difference of money for each person, for each expiry?

The column is yellow is what I want. You can see that it is the difference in money for each expiry point for the person. I highlighted the other rows in colors so it is more clear.

Thanks a lot.

Example

[enter image description here]

Upvotes: 0

Views: 97

Answers (2)

RichieV
RichieV

Reputation: 5183

You can solve it with groupby.diff

Take the dataframe

df = pd.DataFrame({
    'Day': [30, 30, 30, 30, 29, 29, 28, 28],
    'Name': ['John', 'Mike', 'John', 'Mike', 'John', 'Mike', 'John', 'Mike'],
    'Money': [100, 950, 200, 1000, 50, 50, 250, 1200],
    'Expiry': [1, 1, 2, 2, 1, 1, 2, 2]
})
print(df)

Which looks like

   Day  Name  Money  Expiry
0   30  John    100       1
1   30  Mike    950       1
2   30  John    200       2
3   30  Mike   1000       2
4   29  John     50       1
5   29  Mike     50       1
6   28  John    250       2
7   28  Mike   1200       2

And the code

# make sure we have dates in the order we want
df.sort_values('Day', ascending=False)

# groubpy and get the difference from the next row in each group
# diff(1) calculates the difference from the previous row, so -1 will point to the next
df['Difference'] = df.groupby(['Name', 'Expiry']).Money.diff(-1)

Output

   Day  Name  Money  Expiry  Difference
0   30  John    100       1        50.0
1   30  Mike    950       1       900.0
2   30  John    200       2       -50.0
3   30  Mike   1000       2      -200.0
4   29  John     50       1         NaN
5   29  Mike     50       1         NaN
6   28  John    250       2         NaN
7   28  Mike   1200       2         NaN

Upvotes: 0

angelsarebeautiful
angelsarebeautiful

Reputation: 118

import pandas as pd
import numpy as np

example = pd.DataFrame( data = {'Day': ['2020-08-30', '2020-08-30','2020-08-30','2020-08-30',
                                        '2020-08-29', '2020-08-29','2020-08-29','2020-08-29'],
                                'Name': ['John', 'Mike', 'John', 'Mike','John', 'Mike', 'John', 'Mike'],
                                'Money': [100, 950, 200, 1000, 50, 50, 250, 1200],
                                'Expiry': ['1Y', '1Y', '2Y','2Y','1Y','1Y','2Y','2Y']})

example_0830 = example[ example['Day']=='2020-08-30' ].reset_index()
example_0829 = example[ example['Day']=='2020-08-29' ].reset_index()

example_0830['key'] = example_0830['Name'] + example_0830['Expiry']
example_0829['key'] = example_0829['Name'] + example_0829['Expiry']
example_0829 = pd.DataFrame( example_0829, columns = ['key','Money'])

example_0830 = pd.merge(example_0830, example_0829, on = 'key')
example_0830['Difference'] = example_0830['Money_x'] - example_0830['Money_y']
example_0830 = example_0830.drop(columns=['key', 'Money_y','index'])

Result:

          Day  Name  Money_x Expiry  Difference
0  2020-08-30  John      100     1Y          50
1  2020-08-30  Mike      950     1Y         900
2  2020-08-30  John      200     2Y         -50
3  2020-08-30  Mike     1000     2Y        -200

If the difference is just derived from the previous date, you can just define a date variable in the beginning to find today(t) and previous day (t-1) to filter out original dataframe.

Upvotes: 1

Related Questions