Reputation: 1645
I have a dataframe DF, with two columns A and B shown below:
A B
1 0
3 0
4 0
2 1
6 0
4 1
7 1
8 1
1 0
First part: A sliding window approach should be performed as shown below. I need to calculate mean for column B in a sliding window of size 3 sliding by 1 position . The mean values for each window are calculated manually and shown on the left side.
A: 1 3 4 2 6 4 7 8 1
B: 0 0 0 1 0 1 1 1 0
[0 0 0] 0
[0 0 1] 0.33
[0 1 0] 0.33
[1 0 1] 0.66
[0 1 1] 0.66
[1 1 1] 1
[1 1 0] 0.66
output: 0 0.33 0.33 0.66 0.66 1 1 1 0.66
Second part :Now, for each row/coordinate in column A, all windows containing the coordinate are considered and should retain the highest mean value which gives the results as shown in column 'output'.
Detailed explanation for second part:The first part is calculating the mean in a sliding window 3 sliding by 1 position. The second step is: For each coordinate 'i' in column A, all windows containing the coordinate 'i' should be evaluated and retain the highest mean score. For example in column A, 1 is present only in the first window, so the score for 1 is 0 (which is the mean of the first window). Similarly, 2 is present in first and second window, therefore the score for 2 should be the highest among the scores of window1 and window2 i.e. max(0, 0.33333). Likewise 3 is present in first,second and third windows, therefore score for 3 is max of the scores of first three windows i.e. max(0,0.333333,0.3333333). 4 is present in second,third and fourth windows, therefore score for 4 is max of the scores of those windows i.e. max(0.333333,0.3333333,0.666667)and so on..
I need to obtain the output as shown above. The output should like:
A B Output
1 0 0
3 0 0.33
4 0 0.33
2 1 0.66
6 0 0.66
4 1 1
7 1 1
8 1 1
1 0 0.66
Any help in python would be highly appreciated?
Upvotes: 1
Views: 1567
Reputation: 23120
For the first part, using numpy
:
WS = 3
B = numpy.array([0,0,0,1,0,1,1,1,0])
filt = numpy.ones(WS) / WS
mean = numpy.convolve(B, filt, 'valid')
For the second part:
paddedmean = numpy.zeros(mean.size + 2 * (WS - 1))
paddedmean[WS-1:-(WS-1)] = mean
output = [numpy.max(paddedmean[i:i+WS]) for i in range(mean.size+WS-1)]
But what is A
used for???
Upvotes: 1