Reputation: 3391
I am trying to perform a runs count from two random variables x and y.
x <- rnorm(30, mean = 4, sd = 1)
y <- rnorm(20, mean = 2.5, sd = 1)
nx <- length(x)
ny <- length(y)
data <- c(x, y)
names(data) <- c(rep("x", nx), rep("y", ny))
data <- sort(data)
rank <- rank(data)
rbind(data, rank)
y y y y x y y x x y y x y
data 0.6814071 1.014124 1.049729 1.050243 1.164338 1.813754 1.955806 1.973856 2.013982 2.065336 2.402596 2.40338 2.445579
rank 1.0000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.000000 8.000000 9.000000 10.000000 11.000000 12.00000 13.000000
x y x y x y x y x x x y
data 2.495905 2.533128 2.605192 2.675883 2.705004 2.740396 2.84131 2.841654 2.886925 3.024089 3.115692 3.246089
rank 14.000000 15.000000 16.000000 17.000000 18.000000 19.000000 20.00000 21.000000 22.000000 23.000000 24.000000 25.000000
x x x y x x y y x y x x
data 3.389303 3.398962 3.606407 3.657708 3.716344 3.763198 3.895701 3.944308 3.955861 3.9881 4.022458 4.075013
rank 26.000000 27.000000 28.000000 29.000000 30.000000 31.000000 32.000000 33.000000 34.000000 35.0000 36.000000 37.000000
x y x x x y x x x x x x
data 4.151537 4.21085 4.245625 4.355177 4.35652 4.409624 4.522272 4.541122 4.616041 4.619815 4.696114 4.988771
rank 38.000000 39.00000 40.000000 41.000000 42.00000 43.000000 44.000000 45.000000 46.000000 47.000000 48.000000 49.000000
x
data 5.591174
rank 50.000000
names(data)
[1] "y" "y" "y" "y" "x" "y" "y" "x" "x" "y" "y" "x" "y" "x" "y" "x" "y" "x" "y" "x" "y" "x" "x" "x" "y" "x" "x" "x" "y" "x" "x"
[32] "y" "y" "x" "y" "x" "x" "x" "y" "x" "x" "x" "y" "x" "x" "x" "x" "x" "x" "x"
In the final line [names(data)], if the sequence is from "y" to "x" ( or from "x" to y") then the run count is 1 and then the next run as 2, and so on. From "x" to "x" or "y" to "y", the run count is 0. From this total 50 values I like to get the total runs count. I am trying to use "rle" function, but I am not reaching the output.
Thanks in advance.
Upvotes: 0
Views: 105
Reputation: 263421
So it would appear that the answer is:
length( rle(names(data))$values )
> rle(names(data))
Run Length Encoding
lengths: int [1:16] 6 1 7 1 2 2 1 2 1 1 ...
values : chr [1:16] "y" "x" "y" "x" "y" "x" "y" "x" "y" "x" "y" ...
> nx+ny
[1] 50
In that particular run the answer was 16 but in yours it was apparently larger. You should use set.seed(.)
to construct reproducible examples.
Upvotes: 4