Reputation:
Probably a naive question, but I am not able to get the logical hold on the problem. I have a dataset which looks like the same -
Date_1 Count Value
0 18/8/2018 220 10
1 24/8/2018 231 14
2 29/8/2018 272 9
3 20/7/2018 83 5
4 13/7/2018 53 5
5 24/7/2018 120 10
6 17/8/2018 196 1
7 26/7/2018 118 1
8 22/8/2018 236 11
9 19/7/2018 74 4
10 26/7/2018 118 12
11 10/8/2018 168 16
12 29/8/2018 272 2
13 5/7/2018 18 148
14 24/8/2018 231 11
15 3/8/2018 154 13
16 20/8/2018 245 1
17 31/8/2018 320 6
18 26/8/2018 287 8
19 10/7/2018 39 24
20 2/8/2018 143 12
21 2/8/2018 143 12
This is just a snippet of the csv that I have. Date corresponds to a particular date, count corresponds to the number of times the date is being repeated and Value is a information for each date. I want to create a new column Value_2 which would have the particular logic :
for a particular date it would be 3500-Value. So for example
for date 2/8/2018 as you can see 2 dates in the example the output should be -
Date_1 Count Value Value_2
2/8/2018 143 12 3488(3500-12)
2/8/2018 143 12 3476(3488-12)
and so on..
The dataset has a total of 20000 entries.
I am able to achieve the total for a particular date using the groupby
function in python, but not able to achieve this for a particular instance of a date.
Thanks
Upvotes: 0
Views: 37
Reputation: 10890
You can use
df['Value_2'] = 3500 - df.groupby('Date_1').Value.cumsum()
# Date_1 Count Value Value_2
# 0 18/8/2018 220 10 3490
# 1 24/8/2018 231 14 3486
# 2 29/8/2018 272 9 3491
# 3 20/7/2018 83 5 3495
# 4 13/7/2018 53 5 3495
# 5 24/7/2018 120 10 3490
# 6 17/8/2018 196 1 3499
# 7 26/7/2018 118 1 3499
# 8 22/8/2018 236 11 3489
# 9 19/7/2018 74 4 3496
# 10 26/7/2018 118 12 3487
# 11 10/8/2018 168 16 3484
# 12 29/8/2018 272 2 3489
# 13 5/7/2018 18 148 3352
# 14 24/8/2018 231 11 3475
# 15 3/8/2018 154 13 3487
# 16 20/8/2018 245 1 3499
# 17 31/8/2018 320 6 3494
# 18 26/8/2018 287 8 3492
# 19 10/7/2018 39 24 3476
# 20 2/8/2018 143 12 3488
# 21 2/8/2018 143 12 3476
Upvotes: 1