Reputation: 323
Consider this dataframe :
col1 | col2
1 | 1
1 | 2
1 | 3
2 | 4
2 | 5
2 | 6
I want to a new column, say col3
in the dataframe, which has the following definition : the ith
element col3[i]
is the mean of all values of col2[j]
, for all j such that col1[i] == col1[j] && i!=j
.
The for loop for it goes like this :
for (i in 1:length(data$col2))
{
sum = 0
count = 0
for (j in 1:length(data$col1))
{
if (data$col1[j] == data$col1[i] && i!=j)
{
sum = sum + data$col2[j]
count = count + 1
}
}
data$col3[i] = sum/count
}
The final table is :
col1 | col2 | col3
1 | 1 | 2.5
1 | 2 | 2
1 | 3 | 1.5
2 | 4 | 5.5
2 | 5 | 5
2 | 6 | 4.5
I could use an apply function, but that would take me pretty much as much time as the for loop, right? Any help with giving a vectorized version of this loop is appreciated.
Upvotes: 2
Views: 46
Reputation: 887048
This can be done with ave
from base R
df1$col3 <- with(df1, ave(col2, col1,
FUN=function(x) (sum(x)-x)/(length(x)-1)))
Or using data.table
library(data.table)
setDT(df1)[, col3 := (sum(col2)-col2)/(.N-1) , col1]
Upvotes: 2
Reputation: 24945
You can use dplyr
:
library(dplyr)
dat %>% group_by(col1) %>%
mutate(col3 = (sum(col2) - col2)/(n()-1))
Source: local data frame [6 x 3]
Groups: col1 [2]
col1 col2 col3
(int) (int) (dbl)
1 1 1 2.5
2 1 2 2.0
3 1 3 1.5
4 2 4 5.5
5 2 5 5.0
6 2 6 4.5
Upvotes: 3