How to speedup for and if loop in R

Question

In my current project, I have around 8.2 million rows. I want to scan for all rows and apply a certain function if the value of a specific column is not zero.

counter=1
for(i in 1:nrow(data)){
if(data[i,8]!=0){
totalclicks=sum(data$Clicks[counter:(i-1)])
test$Clicks[i]=totalclicks
counter=i
  }
}

In the above code, I am searching for the specific column over 8.2 million rows and if values are not zero then I will calculate sum over values. The problem is that for and if loops are too slow. It takes 1 hour for 50K rows. I heard that apply family is alternative for this. The following code also takes too long:

sapply(1:nrow(data), function(x) 
if(data[x,8]!=0){
totalclicks=sum(data$Clicks[counter:(x-1)])
test$Clicks[x]=totalclicks
counter=x
})

[Updated] Kindly consider the following as sample dataset:

clicks revenue new_column (sum of previous clicks)
  1      0       
  2      0
  3      5        3
  1      0
  4      0
  2      7        8

I want above kind of solution, in which I will go through all the rows. If any non-zero revenue value is encountered then it will add all previous values of clicks.

Am I missing something? Please correct me.

Miff · Accepted Answer

The aggregate() function can be used for splitting your long dataframe into chunks and performing operations on each chunk, so you could apply it in your example as:

data <- data.frame(Clicks=c(1,2,3,1,4,2),
                   Revenue=c(0,0,5,0,0,7),
                   new_column=NA)

sub_totals <- aggregate(data$Clicks, list(cumsum(data$Revenue)), sum)
data$new_column[data$Revenue != 0] <- head(sub_totals$x, -1)

How to speedup for and if loop in R

Answers (1)

Related Questions