Reputation: 1181
I am having difficulty subsetting my data by factors in a for loop. Here is a illustrative example:
x<-rnorm(n=40, m=0, sd=1)
y<-rep(1:5, 8)
df<-as.data.frame(cbind(x,y))
df_split<-split(df, df$y)
mean_vect<-rep(-99, 5)
for (i in c(1:5)) {
current_df<-df_split$i
mean_vect[i]<-mean(current_df)
}
`
This approach is not working because I think R is looking for a split called "i" when I really want it to pull out the ith split! I have also tried the subset function with little joy. I always run into these problems when I am trying to split on a non-numeric factor so any help would be appreciated
Upvotes: 0
Views: 951
Reputation: 23758
FYI, the functionality to accomplish this is typically done using tapply
tapply( df$x, df$y, mean )
The first argument specifies the value you want to "mean-group". The second is just the INDEX, i.e. the variable that splits your groups and the last is obviously the function you want to run on these groups, in this case mean.
Upvotes: 3
Reputation: 4932
To get split number i
run
df_split[[i]]
BTW, as your final aim is mean_vect
you better to use
mean_vect <- lapply(df_split, mean)
or:
mean_vect <- tapply(df$x, df$y, mean)
mean_vect
1 2 3 4 5
0.2566810 -0.1528079 -0.2097333 -0.1540343 0.3609312
Upvotes: 1