Reputation: 115
I have a data set like this:
Just 1 table with 2 columns. The first column runs from 1 to 100 and in the second we have random numbers. for eg
x y
1 25
2 51
3 0
- --
48 250
49 500
50 1000
- --- --and so on till
100 600
Now , I need to choose a window of first 50 rows (x = 1 to x= 50).After this I need to find the range of these 50 y values, which is y[max]-y[min]. Then , I need to divide the range by 10 to create my frequency table. In the above example, range = 1000 - 0 = 1000. 1000/10 = 100.
so, my frequency table will look like
0 - 100 count value of y between 0 to 100 say i(0-100)
100 - 200 count value
and so on till
900-1000 count value.
I need to get total count value say " total ". After this, I need to take the ratio of respective count value to total count . For the first row , it would be i(0-100)/total. For the second row , it would be i(100-200)/total and so on. Lets call these values as f. so, f1 = i(0-100)/total ; f2 = i(0-100)/total and so on.
After this , I need to compute summation [ f * ln(f) ]. So, our return value is summation [ f * ln(f)] for the window from 1 to 50. After this , I need to slide this window by 1 position, which can be achieved by roll apply() function in R.
You can use this code to start :
library(zoo)
set.seed(1)
foo <- runif(100)
foo[c(58,59)] <- 0
rollapply(foo,width=50,FUN=function(xx){ ....})
I want some help to complete this "function(xx){ ....})" in the above code. I am struggling to put all the information that i said into this one small function.
Upvotes: 1
Views: 346
Reputation: 4126
Updated answer... I hope following will help you... Though its not exactly what you wanted
set.seed(1)
foo <- sample(1:1000,100)
#Rolling function
rollFreq = function(x, binCount){
temp = hist(x, breaks = binCount, plot = FALSE)
return = temp$counts
}
rollapply(foo, width=50, FUN=rollFreq, binCount = 10)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 4 6 2 5 6 5 8 4 5 5
[2,] 4 6 1 5 6 5 8 4 6 5
[3,] 4 6 1 5 6 5 8 5 5 5
[4,] 4 6 1 5 5 5 8 6 5 5
[5,] 4 6 1 5 4 5 8 6 5 6
[6,] 4 6 1 5 4 5 7 6 5 7
[7,] 4 6 1 5 4 6 7 5 5 7
[8,] 5 5 1 5 4 6 7 5 5 7
[9,] 5 5 1 6 4 6 7 5 4 7
[10,] 5 5 2 6 4 6 7 5 4 6
[11,] 5 6 2 6 4 5 7 5 4 6
[12,] 6 6 2 6 3 5 7 5 4 6
[13,] 6 5 2 7 3 5 7 5 4 6
[14,] 5 5 2 8 3 5 7 5 4 6
[15,] 5 6 2 8 3 5 7 5 3 6
[16,] 5 5 2 8 3 6 7 5 3 6
[17,] 4 5 2 9 3 6 7 5 3 6
[18,] 4 5 2 9 3 6 7 5 4 5
[19,] 4 5 1 9 3 6 7 5 4 6
[20,] 4 4 1 10 3 6 7 5 4 6
[21,] 4 5 1 9 3 6 7 5 4 6
[22,] 4 6 1 9 2 6 7 5 4 6
[23,] 4 6 1 9 1 6 7 5 4 7
[24,] 4 6 1 10 1 5 7 5 4 7
[25,] 4 6 1 10 1 6 7 5 3 7
[26,] 3 6 1 10 1 6 8 5 3 7
[27,] 3 6 1 10 1 5 8 5 4 7
[28,] 3 6 1 10 1 6 7 5 4 7
[29,] 4 6 1 10 1 6 6 5 4 7
[30,] 4 6 1 10 1 6 5 5 5 7
[31,] 4 6 1 10 0 6 5 6 5 7
[32,] 4 6 1 10 0 6 5 6 5 7
[33,] 5 6 1 9 0 6 5 6 5 7
[34,] 6 6 1 9 0 6 5 6 4 7
[35,] 6 6 1 9 0 6 5 6 4 7
[36,] 6 6 1 9 0 5 5 7 4 7
[37,] 6 6 1 9 0 6 5 7 4 6
[38,] 6 7 1 9 0 6 5 7 4 5
[39,] 5 7 1 10 0 6 5 7 4 5
[40,] 5 7 1 10 0 6 6 6 4 5
[41,] 5 7 1 10 0 6 5 6 4 6
[42,] 5 6 1 10 0 6 6 6 4 6
[43,] 5 6 2 10 0 6 5 6 4 6
[44,] 5 6 2 10 0 6 5 6 4 6
[45,] 5 6 3 9 0 6 5 6 4 6
[46,] 5 6 3 9 0 6 4 6 5 6
[47,] 5 7 3 9 0 6 4 5 5 6
[48,] 5 7 3 10 0 6 3 5 5 6
[49,] 5 7 4 9 0 6 3 5 5 6
[50,] 6 6 4 9 0 6 3 5 5 6
[51,] 6 6 4 8 0 6 3 6 5 6
Upvotes: 1
Reputation: 13122
What I could make out from your description is wrapped in the following function:
ff = function(x)
{
rg = range(x)
f = prop.table(table(cut(x,
do.call(seq,
c(as.list(rg),
list(diff(rg) / 10))))))
sum(f * log(f))
}
Using your "foo" I get:
sapply(head(seq_along(foo), (50 - 1)),
function(i) ff(foo[i:(i + (50 - 1))]))
# [1] -2.247295 -2.231095 -2.240361 -2.227678 -2.239769
# [6] -2.244925 -2.239769 -2.223568 -2.246704 -2.251620
#[11] -2.238672 -2.245751 -2.251015 -2.244540 -2.244540
#[16] -2.248029 -2.235686 -2.226600 -2.258055 -2.271002
#[21] -2.278686 -2.255815 -2.251620 -2.251620 -2.231593
#[26] -2.215659 -2.207976 -2.192042 -2.192042 -2.195029
#[31] -2.194966 -2.174102 -2.168838 -2.138807 -2.118781
#[36] -2.127867 -2.127867 -2.130853 -2.130853 -2.143801
#[41] -2.173831 -2.181514 -2.163534 -2.190973 -2.167729
#[46] NaN NaN NaN NaN
Upvotes: 1