Reputation: 561
I'm looking for a nice way to count the longest number of consecutive reductions in a row in a data.table (package version 1.9.2) in R. I am horribly lost and any help is much appreciated. For the example I am trying to do, a reduction is where a value is less than or equal to the previous value (<=).
Below is an toy sample of the data I am dealing with. I have also put down my best attempt so far which to be honest went horribly wrong and it returned an error. My attempt also uses 2 for loops which I'm not hugely keen on since I have been advised apply loops are more often used in R. I have tried searching this site and the web for a similar solution but haven't had any luck. The number of rows I actually have in my full data table is just over 1 million and the number of columns I have is 17.
library(data.table)
TEST_DF <- data.table(COL_1 = c(5,2,3,1), COL_2 = c(1,0,4,2),
COL_3 = c(0,1,6,3), COL_4 = c(0,0,0,4))
TEST_DF$COUNT <- as.numeric(0)
for( i in 1:NROW(TEST_DF))
{
for (j in 1:(NCOL(TEST_DF) - 1))
{
TEST_DF$COUNT[j] <- if (TEST_DF[i, j, with = FALSE] >=
TEST_DF[i, j + 1, with = FALSE])
{
TEST_DF$COUNT[j] + 2
}
}
}
DESIRED <- data.table(COL_1 = c(5,2,3,1), COL_2 = c(1,0,4,2),
COL_3 = c(0,1,6,3), COL_4 = c(0,0,0,4),
COUNT = c(4,2,1,0))
The desired output appears at the bottom of the code. As the 4 four "COL" columns appear in the longest reduction sequence, the COUNT column for the first row would get a value of 4. In the second row, there is a reduction in the first 2 columns and the last two but none in between so the COUNT would get a value of 2 for this. In the last column, there is a reduction from COL_3 to COL_4 so COUNT would get a value of 2 for this row. In any row where there is no reduction such as the last there would be a value of 0 for the COUNT.
Let me know if any further clarification or information is needed.
Thank you so much in advance.
Upvotes: 0
Views: 91
Reputation: 179448
You can use the functions diff()
and rle()
to build a function to extract the run lengths. Then use apply()
across the rows of your data:
foo <- function(x) {
runs <- rle(c(x[2] <= x[1], diff(x) <= 0))
if(all(runs$value == 0)) 0 else max(runs$lengths[runs$value == 1])
}
apply(TEST_DF, 1, foo)
[1] 4 2 1 0
Upvotes: 1
Reputation: 2244
I used apply with one four loop to accomplish what you're looking for. The apply acts on each row, and the for loop compares consecutive columns.
COUNT <- rep(0,4)
for (i in 1:(ncol(TEST_DF)-1)) {
COUNT<-COUNT+apply(TEST_DF,1,function(x) ifelse(x[i]>=x[i+1],1,0))
}
This produces: 3, 2, 1, 0, as there are 3 reductions in the first row. The last column has nothing to compare to, so there can only be three comparisons. I'm not sure why you want it to be 4?
If you want count to be part of your original table:
TEST_DF$COUNT<-COUNT
Upvotes: 0