Logan Whitehouse
Logan Whitehouse

Reputation: 13

Vectorizing a column-by-column comparison to separate values

I'm working with data gathered from multi-channel electrode systems, and am trying to make this run faster than it currently is, but I can't find any good way of doing it without loops.

The gist of it is; I have modified averages for each column (which is a channel), and need to compare each value in a column to the average for that column. If the value is above the adjusted mean, then I need to put that value in another data frame so it can be easily read.

Here is some sample code for the problematic bit:

readout <- data.frame(dimnmames <- c("Values"))
#need to clear the dataframe in order to run it multiple times without errors
#timeFrame is just a subsection of the original data, 60 channels with upwards of a few million rows
readout <- readout[0,]
for (i in 1:ncol(timeFrame)){
  for (g in 1:nrow(timeFrame)){
    if (timeFrame[g,i] >= posCompValues[i,1]) 
      append(spikes, timeFrame[g,i])
  }
}

The data ranges from 500 thousand to upwards of 130 million readings, so if anyone could point me in the right direction I'd appreciate it.

Upvotes: 1

Views: 42

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226087

Something like this should work:

Return values of x greater than y:

cmpfun <- function(x,y) return(x[x>y])

For each element (column) of timeFrame, compare with the corresponding value of the first column of posCompValues

vals1 <- Map(cmpfun,timeFrame,posCompValues[,1])

Collapse the list into a single vector:

spikes <- unlist(vals1)

If you want to save both the value and the corresponding column it may be worth unpacking this a bit into a for loop:

resList <- list()
for (i in seq(ncol(timeFrame))) {
   tt <- timeFrame[,i]
   spikes <- tt[tt>posCompVals[i,1]]
   if (length(spikes)>0) {
      resList[[i]] <- data.frame(value=spikes,orig_col=i)
   }
}
res <- do.call(rbind, resList)

Upvotes: 1

Related Questions