Reputation: 361
I have a dataframe like this (the real one is 7 million records and 345 features) the following image is only a small fraction related to if a cliente make an operation in a month. What I want to do is create a column at the end with the mean difference between each operation. For example in the first record the mean difference (probaly) would be 3
When I said mean difference is like between op1 an op4 there is a distance of 3, then between op4 and op11 is a difference of 7 then between op11 and op15 are 3 of difference an so on. so for this if we sum all the vaues we have 13 divided between the total operations which are op1, op4, op11, op15 (4 operations) we got 3.25. that is what i reffer by mean difference.
Upvotes: 2
Views: 51
Reputation: 294488
numpy.flatnonzero
: Identify where the non-zero values arenumpy.diff
: Find the difference between adjacent values. When passed results from flatnonzero
it finds the differences between positionsnumpy.mean
: Find the average of valuesProduce a new columns 'MD'
with the average positional distance between non-zero values
df.assign(MD=[np.diff(np.flatnonzero(a)).mean() for a in df.to_numpy()])
Upvotes: 1
Reputation: 7224
This might work, if you have your data that you can post so i can create the dataframe, i might be able to get you an exact answer for your data, but this might work:
summary_ave_data = df.copy()
summary_ave_data['mean'] = summary_ave_data.mean(numeric_only=True, axis=1)
summary_ave_data
Upvotes: 1