Tendero
Tendero

Reputation: 1166

Can this loop be sped up in R?

I've just started learning R this week, so I'm pretty bad at it. I've made up a function that recieves three parameters, and I want to perform the following operation:

for (k in 1:nrow(df_t)){
  df_t$colv[k] = link_targets(data = df,
                              target_date = df_t$mtime[k],
                              tag = tag)
}

So basically what I'm trying to do is apply a function to each element of a certain column of df_t, and the value the function returns depends on another column of that same data frame. (The function returns a scalar value).

I was wondering if this could be vectorized to avoid using the loop, which appears to be really slowing down the code.

Let me know if you need any further information to help me out with this.

EDIT:

The function I call in the loop is the following:

link_targets = function (data, target_date, tag){
  # Delete all rows that don't have the tag as name
  data[data$NAME != as.character(unlist(tag[1])),] = NA
  data = na.omit(data)
  # Delete all rows that do not correspond to the dates of the tag
  limit_time_1 = target_date - as.numeric(60 * tag[2] - 60)
  limit_time_2 = target_date - as.numeric(60 * tag[3])
  data[(data$IP_TREND_TIME < min(limit_time_1,limit_time_2))
       | (data$IP_TREND_TIME > max(limit_time_1,limit_time_2)),] = NA
  data = na.omit(data)
  mean_data = mean(as.numeric(data$IP_TREND_VALUE))
  return(mean_data)
}

I'm working with data tables. df is like this:

             NAME       IP_TREND_TIME IP_TREND_VALUE
       1: TC241-1 2018-03-06 12:05:31      194.57875
       2: TC241-1 2018-03-05 17:54:05       196.5219
       3: TC241-1 2018-03-05 05:02:18       211.4066
       4: TC241-1 2018-03-04 03:06:57      211.92874
       5: TC241-1 2018-03-03 06:41:17      205.43651
      ---                                           
13353582: DI204-4 2017-04-06 17:43:41     0.88308918
13353583: DI204-4 2017-04-06 17:43:31     0.88305187
13353584: DI204-4 2017-04-06 17:43:21     0.88303399
13353585: DI204-4 2017-04-06 17:43:11     0.88304734
13353586: DI204-4 2017-04-06 17:43:01     0.88305187

The tag array contains the word I want to look for in the column NAME, and two numbers that represent the time range I want. So for example:

     tag  start end
1 TC204-1    75 190

The output I'm looking (df_t) for would be something like this:

              TREND_TIME TREND_VALUE         colv 
  1: 2018-03-05 05:35:00   1.9300001     16.86248 
  2: 2018-03-05 02:21:00        1.95     18.04356 
  3: 2018-03-04 22:35:00        1.98     17.85405 
  4: 2018-03-04 17:01:00           2     17.87318 
  5: 2018-03-04 12:49:00        2.05     18.10455
 ---                                                      
940: 2017-04-07 15:01:00   2.1500001     20.14933 
941: 2017-04-07 09:27:00         1.9     20.19337    
942: 2017-04-07 04:46:00        1.95     20.20166    
943: 2017-04-07 01:34:00   2.0699999     20.20883    
944: 2017-04-06 21:46:00         1.9     20.15735 

Where colv contains the mean value of all the values in the column IP_TREND_VALUE corresponding to the selected tag and within the range of time determined by the numbers in tag, based on the time in TREND_TIME in df_t.

Upvotes: 0

Views: 94

Answers (1)

minem
minem

Reputation: 3650

It is hard to come up with better solution because it is hard for me to understand your logic and explanation, maybe you could create better and smaller example, where it would be more clearer what are you trying to accomplish.

But you should be able to replace link_targets function with this one:

link_targets <- function(data, target_date, tag) {
  limit_time_1 = target_date - as.numeric(60 * tag[2] - 60)
  limit_time_2 = target_date - as.numeric(60 * tag[3])
  x <- c(limit_time_1, limit_time_2)
  i1 <- data$NAME == as.character(unlist(tag[1]))
  i2 <- (data$IP_TREND_TIME >= min(x)) & (data$IP_TREND_TIME <= max(x))
  mean_data <- mean(as.numeric(data$IP_TREND_VALUE[i1 & i2]))
  return(mean_data)
}

and see great speed improvement.

Update

maybe this function will increase speed on you particular data:

link_targets2 <- function(data, target_date, tag) {
  limit_time_1 <- target_date - as.numeric(60 * tag[[2]] - 60)
  limit_time_2 <- target_date - as.numeric(60 * tag[[3]])
  x <- c(limit_time_1, limit_time_2)
  i1 <- data$NAME == as.character(unlist(tag[1]))
  xx <- data$IP_TREND_TIME[i1]
  i2 <- (xx >= min(x)) & (xx <= max(x))
  mean_data <- mean(as.numeric(data$IP_TREND_VALUE[i1][i2]))
  return(mean_data)
}

Upvotes: 1

Related Questions