Reputation: 167
Expanding a bit on this question, I want to capture changes in values specifically when the previous column value is 0 or when the next column value is 0.
Given the following dataframe, tracking value changes from one column to the next using diff and aggregating these fluctuations in a new set of values is possible.
Item Jan_20 Apr_20 Aug_20 Oct_20
Apple 3 4 4 4
Orange 5 5 1 2
Grapes 0 0 4 4
Berry 5 3 0 0
Banana 0 2 0 0
However, if I were to only capture such differences when the values being changed from one column to the next is either specifically from 0 or to 0 and tracking that as either new fruit or lost fruit, respectively, how would I do that?
Desired outcome:
Type Jan_20 Apr_20 Aug_20 Oct_20
New Fruits 0 2 4 0
Lost Fruits 0 0 5 0
Put another way, in the example, since Grapes
go from a value of 0 in Apr_20
to 4 in Aug_20
, I want 4 to be captured and stored in New Fruits
. Similarly, since Banana
and Berry
both go from a value higher than zero in Apr_20
to 0 in Aug_20
, I want to aggregate those values in Lost Fruits
.
How could this be achieved?
Upvotes: 0
Views: 229
Reputation: 260790
This can be achieved using masks to hide the non relevant data, combined with diff
and sum
:
d = df.set_index('Item')
# mask to select values equal to zero
m = d.eq(0)
# difference from previous date
d = d.diff(axis=1)
out = pd.DataFrame({'New' : d.where(m.shift(axis=1)).sum(),
'Lost': -d.where(m).sum()}
).T
Output:
Jan_20 Apr_20 Aug_20 Oct_20
New 0 2 4 0
Lost 0 0 5 0
Upvotes: 1