looking for the difference between ocurrences in a datframe

Question

I have a dataframe like this (the real one is 7 million records and 345 features) the following image is only a small fraction related to if a cliente make an operation in a month. What I want to do is create a column at the end with the mean difference between each operation. For example in the first record the mean difference (probaly) would be 3

When I said mean difference is like between op1 an op4 there is a distance of 3, then between op4 and op11 is a difference of 7 then between op11 and op15 are 3 of difference an so on. so for this if we sum all the vaues we have 13 divided between the total operations which are op1, op4, op11, op15 (4 operations) we got 3.25. that is what i reffer by mean difference.

piRSquared · Accepted Answer

numpy.flatnonzero: Identify where the non-zero values are
numpy.diff: Find the difference between adjacent values. When passed results from flatnonzero it finds the differences between positions
numpy.mean: Find the average of values

Produce a new columns 'MD' with the average positional distance between non-zero values

df.assign(MD=[np.diff(np.flatnonzero(a)).mean() for a in df.to_numpy()])

looking for the difference between ocurrences in a datframe

Answers (2)

Related Questions