Pandas: Calculate value changes with diff based on condition

Question

Expanding a bit on this question, I want to capture changes in values specifically when the previous column value is 0 or when the next column value is 0.

Given the following dataframe, tracking value changes from one column to the next using diff and aggregating these fluctuations in a new set of values is possible.

Item    Jan_20  Apr_20  Aug_20  Oct_20
Apple       3       4       4       4
Orange      5       5       1       2
Grapes      0       0       4       4
Berry       5       3       0       0
Banana      0       2       0       0

However, if I were to only capture such differences when the values being changed from one column to the next is either specifically from 0 or to 0 and tracking that as either new fruit or lost fruit, respectively, how would I do that?

Desired outcome:

Type                Jan_20  Apr_20  Aug_20  Oct_20
New Fruits              0        2       4      0
Lost Fruits             0        0       5      0

Put another way, in the example, since Grapes go from a value of 0 in Apr_20 to 4 in Aug_20, I want 4 to be captured and stored in New Fruits. Similarly, since Banana and Berry both go from a value higher than zero in Apr_20 to 0 in Aug_20, I want to aggregate those values in Lost Fruits.

How could this be achieved?

mozway · Accepted Answer

This can be achieved using masks to hide the non relevant data, combined with diff and sum:

d = df.set_index('Item')

# mask to select values equal to zero
m = d.eq(0)
# difference from previous date
d = d.diff(axis=1)

out = pd.DataFrame({'New' : d.where(m.shift(axis=1)).sum(),
                    'Lost': -d.where(m).sum()}
                   ).T

Output:

      Jan_20  Apr_20  Aug_20  Oct_20
New        0       2       4       0
Lost       0       0       5       0

Pandas: Calculate value changes with diff based on condition

Answers (1)

Related Questions