Reputation: 1
The problem is that the new column needs to have its values repeated a certain number of times. The data looks like the following, but repeats for hundreds of thousands of rows:
Dataframe: Ratings
**USER#** **ITEM#** **Rating**
USER1 ITEM1 ....3
USER1 ITEM2 ....2
USER1 ITEM3 ....4
USER2 ITEM1 ....1
USER2 ITEM2 ....2
USER2 ITEM3 ....5
I want to add a column that would have each user's mean average in each of their rows, so it would be like the following:
**USER#** **ITEM#** **Rating** **UserMean**
USER1 ITEM1 ....3 ...... 3
USER1 ITEM2 ....2 ...... 3
USER1 ITEM3 ....4 ...... 3
USER2 ITEM1 ....1 ......2.67
USER2 ITEM2 ....2 ...... 2.67
USER2 ITEM3 ....5 ...... 2.67
I know how to get all the User Means using the following:
UserMean<-tapply(Ratings$Rating,list(Ratings$User),mean)
This gives each user's mean and I want for each row to show that user's mean rating, but it doesn't work when I use:
Ratings$UserMean<-UserMean # or the above tapply function
How could I achieve my goal? I know how to create an array showing how many times each user voted. Could I use that array somehow?
Thanks
Upvotes: 0
Views: 63
Reputation: 92300
data.table
solution
library(data.table)
setDT(dat)[, UserMean := mean(Rating), by = USER]
dat
Or a less effective usage of base R functions than proposed above
merge(dat, aggregate(Rating ~ USER, dat, mean), by = "USER")
Upvotes: 2
Reputation: 70336
Another option is to use dplyr:
require(dplyr)
Ratings <- Ratings %.% group_by(USER) %.% mutate(UserMean = mean(Rating))
# USER ITEM Rating UserMean
#1 USER1 ITEM1 3 3.000000
#2 USER1 ITEM2 2 3.000000
#3 USER1 ITEM3 4 3.000000
#4 USER2 ITEM1 1 2.666667
#5 USER2 ITEM2 2 2.666667
#6 USER2 ITEM3 5 2.666667
Upvotes: 1
Reputation: 121608
You should use ave
( like mentioned in the other answer, I include my answer because I spent a lot of time to polish your data).
dat <- read.table(text='USER ITEM Rating
USER1 ITEM1 3
USER1 ITEM2 2
USER1 ITEM3 4
USER2 ITEM1 1
USER2 ITEM2 2
USER2 ITEM3 5',header=TRUE)
dat$UserMean <- ave(dat$Rating,dat$USER)
USER ITEM Rating UserMean
1 USER1 ITEM1 3 3.000000
2 USER1 ITEM2 2 3.000000
3 USER1 ITEM3 4 3.000000
4 USER2 ITEM1 1 2.666667
5 USER2 ITEM2 2 2.666667
6 USER2 ITEM3 5 2.666667
Another option is to use plyr
:
library(plyr)
ddply(dat,.(USER),transform,userMean= mean(Rating))
Upvotes: 2
Reputation: 206566
You are close, you just need the ave()
function. Try
Ratings<-data.frame(
User=rep(1:2, each=3),
Item=rep(letters[1:3], 2),
Rating=c(3,2,4,1,2,5)
)
UserMean <- ave(Ratings$Rating, Ratings$User, FUN=mean)
The ave()
function will calculate values for each level you specify, and then preserve that value in the original order of your levels. It's basically like tapply
in many cases, but it doesn't collapse values. It also can return different values for each level of the factor. For example
ReviewNum <- ave(Ratings$Rating, Ratings$User, FUN=seq_along)
which can track which review number it is for each user.
Upvotes: 2