Reputation: 439
I have following pandas DataFrame and trying to create a new "Value_Diff" column where it calculates the difference between current "value" if lable=0 with the previous value where label=1. If label=1 it sets the "Value_Diff" equals to 0. This process needs to be repeated for each group and if first label in the group is equal to 0 it should leave the "Value_Diff" equal to 0 until it reaches first label=1 and then follow the same logic ( group C in this example)
I can write a for loop and if statement for each individual group to do this, however was wondering if there is a better way to do this with using groupby, lambda or any other function.
here is the input:
group Date Value label
A 2020-03-01 -117 1
A 2020-03-02 -121 0
A 2020-03-03 -122 0
A 2020-03-04 -122 1
B 2020-03-05 -118 1
B 2020-03-06 -122 0
B 2020-03-07 -124 0
B 2020-03-08 -126 0
B 2020-03-09 -126 1
C 2020-03-10 -130 0
C 2020-03-11 -140 0
C 2020-03-12 -150 1
C 2020-03-13 -160 0
Answer should look like this:
group Date Value label Value_Diff
A 2020-03-01 -117 1 0
A 2020-03-02 -121 0 4 (-117-(-121)=4)
A 2020-03-03 -122 0 1
A 2020-03-04 -122 1 0
B 2020-03-05 -118 1 0
B 2020-03-06 -122 0 4
B 2020-03-07 -124 0 2
B 2020-03-08 -126 0 2
B 2020-03-09 -126 1 0
C 2020-03-10 -130 0 0
C 2020-03-11 -140 0 0
C 2020-03-12 -150 1 0
C 2020-03-13 -160 0 10
Sorry my first output didn't actually reflect what I wanted, because @BENY provided the solution to this output I'll leave this here to help others with the same question. Here is how the actual output should look like.
group Date Value label Value_Diff
A 2020-03-01 -117 1 0
A 2020-03-02 -121 0 4 (-117-(-121)=4)
A 2020-03-03 -122 0 5 (-117-(-122)=5)
A 2020-03-04 -122 1 0
B 2020-03-05 -118 1 0
B 2020-03-06 -122 0 4 (-122-(-118)=4)
B 2020-03-07 -124 0 6 (-124-(-118)=6)
B 2020-03-08 -126 0 8 (-126-(-118)=8)
B 2020-03-09 -126 1 0
C 2020-03-10 -130 0 0
C 2020-03-11 -140 0 0
C 2020-03-12 -150 1 0
C 2020-03-13 -160 0 10 (-150-(-160)=10)
Upvotes: 2
Views: 940
Reputation: 11613
I find it's best to do these things one step at a time. I think this does (the second version) of what you want:
df['Value_Last'] = np.nan
df.loc[df.label == 1, 'Value_Last'] = df.loc[df.label == 1, 'Value']
df_grouped = df.groupby('group')
df['Value_Last'] = df_grouped['Value_Last'].ffill()
df['Value_Diff'] = (df['Value_Last'] - df['Value']).fillna(0)
group Date Value label Value_Last Value_Diff
0 A 2020-03-01 -117 1 -117.0 0.0
1 A 2020-03-02 -121 0 -117.0 4.0
2 A 2020-03-03 -122 0 -117.0 5.0
3 A 2020-03-04 -122 1 -122.0 0.0
4 B 2020-03-05 -118 1 -118.0 0.0
5 B 2020-03-06 -122 0 -118.0 4.0
6 B 2020-03-07 -124 0 -118.0 6.0
7 B 2020-03-08 -126 0 -118.0 8.0
8 B 2020-03-09 -126 1 -126.0 0.0
9 C 2020-03-10 -130 0 NaN 0.0
10 C 2020-03-11 -140 0 NaN 0.0
11 C 2020-03-12 -150 1 -150.0 0.0
12 C 2020-03-13 -160 0 -150.0 10.0
PS: If the 'Value'
column contains only integers then you can use .astype(int)
on the end of the last line to convert 'Value_Diff'
back to integers.
Upvotes: 1
Reputation: 323226
We can try with cummax
create the 1st condition, then use groupby
with diff
s = df.groupby('group').label.cummax()
df['new'] = -df.groupby('group').Value.diff().fillna(0).where(s==1 & df.label.ne(1),0)
df
Out[135]:
group Date Value label new
0 A 2020-03-01 -117 1 -0.0
1 A 2020-03-02 -121 0 4.0
2 A 2020-03-03 -122 0 1.0
3 A 2020-03-04 -122 1 -0.0
4 B 2020-03-05 -118 1 -0.0
5 B 2020-03-06 -122 0 4.0
6 B 2020-03-07 -124 0 2.0
7 B 2020-03-08 -126 0 2.0
8 B 2020-03-09 -126 1 -0.0
9 C 2020-03-10 -130 0 -0.0
10 C 2020-03-11 -140 0 -0.0
11 C 2020-03-12 -150 1 -0.0
12 C 2020-03-13 -160 0 10.0
Upvotes: 2