How to vectorize this R function when elements depend on other elements in dataframe

Question

Consider this dataframe :

col1 | col2
  1  |  1 
  1  |  2
  1  |  3
  2  |  4
  2  |  5
  2  |  6

I want to a new column, say col3 in the dataframe, which has the following definition : the ith element col3[i] is the mean of all values of col2[j], for all j such that col1[i] == col1[j] && i!=j.

The for loop for it goes like this :

for (i in 1:length(data$col2))
{
    sum = 0
    count = 0
    for (j in 1:length(data$col1))
    {
        if (data$col1[j] == data$col1[i] && i!=j)
        {
            sum = sum + data$col2[j]
            count = count + 1
        }
    }
    data$col3[i] = sum/count
}

The final table is :

col1 | col2 | col3
  1  |  1   | 2.5
  1  |  2   | 2
  1  |  3   | 1.5
  2  |  4   | 5.5
  2  |  5   | 5
  2  |  6   | 4.5

I could use an apply function, but that would take me pretty much as much time as the for loop, right? Any help with giving a vectorized version of this loop is appreciated.

jeremycg · Accepted Answer

You can use dplyr:

library(dplyr)
dat %>% group_by(col1) %>%
        mutate(col3 = (sum(col2) - col2)/(n()-1))
Source: local data frame [6 x 3]
Groups: col1 [2]

   col1  col2  col3
  (int) (int) (dbl)
1     1     1   2.5
2     1     2   2.0
3     1     3   1.5
4     2     4   5.5
5     2     5   5.0
6     2     6   4.5

How to vectorize this R function when elements depend on other elements in dataframe

Answers (2)

Related Questions