Reducing Row Sequences In R Lengths

Question

I'm looking for a nice way to count the longest number of consecutive reductions in a row in a data.table (package version 1.9.2) in R. I am horribly lost and any help is much appreciated. For the example I am trying to do, a reduction is where a value is less than or equal to the previous value (<=).

Below is an toy sample of the data I am dealing with. I have also put down my best attempt so far which to be honest went horribly wrong and it returned an error. My attempt also uses 2 for loops which I'm not hugely keen on since I have been advised apply loops are more often used in R. I have tried searching this site and the web for a similar solution but haven't had any luck. The number of rows I actually have in my full data table is just over 1 million and the number of columns I have is 17.

library(data.table)

TEST_DF <- data.table(COL_1 = c(5,2,3,1), COL_2 = c(1,0,4,2), 
                      COL_3 = c(0,1,6,3), COL_4 = c(0,0,0,4))

TEST_DF$COUNT <- as.numeric(0)

for( i in 1:NROW(TEST_DF))
{
  for (j in 1:(NCOL(TEST_DF) - 1))
  {
    TEST_DF$COUNT[j] <- if (TEST_DF[i, j, with = FALSE] >= 
                            TEST_DF[i, j + 1, with = FALSE])
                        {
                            TEST_DF$COUNT[j] + 2
                        }
  }
}

DESIRED <- data.table(COL_1 = c(5,2,3,1), COL_2 = c(1,0,4,2), 
                      COL_3 = c(0,1,6,3), COL_4 = c(0,0,0,4),
                      COUNT = c(4,2,1,0))

The desired output appears at the bottom of the code. As the 4 four "COL" columns appear in the longest reduction sequence, the COUNT column for the first row would get a value of 4. In the second row, there is a reduction in the first 2 columns and the last two but none in between so the COUNT would get a value of 2 for this. In the last column, there is a reduction from COL_3 to COL_4 so COUNT would get a value of 2 for this row. In any row where there is no reduction such as the last there would be a value of 0 for the COUNT.

Let me know if any further clarification or information is needed.

Thank you so much in advance.

Andrie · Accepted Answer

You can use the functions diff() and rle() to build a function to extract the run lengths. Then use apply() across the rows of your data:

foo <- function(x) {
  runs <- rle(c(x[2] <= x[1], diff(x) <= 0))
  if(all(runs$value == 0)) 0 else max(runs$lengths[runs$value == 1])
}

apply(TEST_DF, 1, foo)

[1] 4 2 1 0

Reducing Row Sequences In R Lengths

Answers (2)

Related Questions