Chetan Arvind Patil
Chetan Arvind Patil

Reputation: 866

Optimize Apply() While() in R

The data below is being used to perform comparative analysis. I wrote the code using apply() and while(), and even though it works as expected, I haven't been successful in optimizing it further. Current run time is more than couple of hours in larger data set.

Following is small example data set:

data_1

A B C D
2 1 3 2.5

data_2

P Q R S
3 2 4 5.5

Data

 A   B   C   D
1.0 0.5 1.3 1.5
1.5 1.2 5.5 3.5
1.1 0.5 1.3 1.5
1.5 1.2 5.5 3.5
1.5 1.2 5.5 3.5
1.1 0.5 1.3 1.5
1.5 1.2 5.5 3.5
1.0 0.5 1.3 1.5

Code

# Row counter 
rowLine <<- 0

# Set current column to first one
columnLine <<- 1

# Preserve column header and dimensions for final data
finalData <- Data

# Find recursively
findThreshold <- function () {

  if ( columnLine <= ncol(Data) ){

    # Initialize row navigation to zero
    rowLine  <<- 1

    # Navigate through rows
    while (rowLine <= nrow(Data)){

      # If outside threshold
      if ( (Data[rowLine, columnLine] < data_1[columnLine]) |
           (Data[rowLine, columnLine] > data_2[columnLine])){

        finalData[rowLine, columnLine] <<- 1

      } else {

        finalData[rowLine, columnLine] <<- 0

      }

      # Increment row counter
      rowLine <<- rowLine + 1

    }
  }

  # Increment column counter
  columnLine <<- columnLine + 1

}

# Apply
apply(Data, 2, function(x) findThreshold())

I also understand using <<- is a big no when it comes using it with loops and recursively analysis like apply().

Please suggest how I can improve this logic further, thanks.

Upvotes: 0

Views: 110

Answers (1)

thelatemail
thelatemail

Reputation: 93908

Sounds like a simple Map exercise:

data.frame(Map(function(d,l,h) d < l | d > h, Data, data_1, data_2))
#     A     B    C     D
#1 TRUE  TRUE TRUE  TRUE
#2 TRUE FALSE TRUE FALSE
#3 TRUE  TRUE TRUE  TRUE
#4 TRUE FALSE TRUE FALSE
#5 TRUE FALSE TRUE FALSE
#6 TRUE  TRUE TRUE  TRUE
#7 TRUE FALSE TRUE FALSE
#8 TRUE  TRUE TRUE  TRUE

Just wrap the logical comparison in as.integer if you want a 0/1 output instead:

data.frame(Map(function(d,l,h) as.integer(d < l | d > h), Data, data_1, data_2))

If your data are matrix objects to start with, you could use sweep:

sweep(Data, 2, data_1, FUN=`<`) | sweep(Data, 2, data_2, FUN=`>`)
#        A     B    C     D
#[1,] TRUE  TRUE TRUE  TRUE
#[2,] TRUE FALSE TRUE FALSE
#[3,] TRUE  TRUE TRUE  TRUE
#[4,] TRUE FALSE TRUE FALSE
#[5,] TRUE FALSE TRUE FALSE
#[6,] TRUE  TRUE TRUE  TRUE
#[7,] TRUE FALSE TRUE FALSE
#[8,] TRUE  TRUE TRUE  TRUE

Upvotes: 3

Related Questions