elmergantry1960
elmergantry1960

Reputation: 1

I'm having trouble adding a column to a dataframe based on a function

The problem is that the new column needs to have its values repeated a certain number of times. The data looks like the following, but repeats for hundreds of thousands of rows:

Dataframe: Ratings

 **USER#**  **ITEM#**    **Rating**

USER1     ITEM1 ....3

 USER1     ITEM2   ....2

 USER1     ITEM3   ....4

 USER2     ITEM1   ....1

 USER2     ITEM2   ....2

 USER2     ITEM3   ....5

I want to add a column that would have each user's mean average in each of their rows, so it would be like the following:

 **USER#**      **ITEM#**    **Rating**  **UserMean**

 USER1     ITEM1   ....3   ......    3

 USER1     ITEM2   ....2 ......       3

 USER1     ITEM3   ....4 ......      3

 USER2     ITEM1   ....1       ......2.67

 USER2     ITEM2   ....2  ......     2.67

 USER2     ITEM3   ....5     ......  2.67

I know how to get all the User Means using the following:

UserMean<-tapply(Ratings$Rating,list(Ratings$User),mean)

This gives each user's mean and I want for each row to show that user's mean rating, but it doesn't work when I use:

Ratings$UserMean<-UserMean # or the above tapply function

How could I achieve my goal? I know how to create an array showing how many times each user voted. Could I use that array somehow?

Thanks

Upvotes: 0

Views: 63

Answers (4)

David Arenburg
David Arenburg

Reputation: 92300

data.table solution

library(data.table)
setDT(dat)[, UserMean := mean(Rating), by = USER]
dat

Or a less effective usage of base R functions than proposed above

merge(dat, aggregate(Rating ~ USER, dat, mean), by = "USER")

Upvotes: 2

talat
talat

Reputation: 70336

Another option is to use dplyr:

require(dplyr)

Ratings <- Ratings %.% group_by(USER) %.% mutate(UserMean = mean(Rating))  

#   USER  ITEM Rating UserMean
#1 USER1 ITEM1      3 3.000000
#2 USER1 ITEM2      2 3.000000
#3 USER1 ITEM3      4 3.000000
#4 USER2 ITEM1      1 2.666667
#5 USER2 ITEM2      2 2.666667
#6 USER2 ITEM3      5 2.666667

Upvotes: 1

agstudy
agstudy

Reputation: 121608

You should use ave ( like mentioned in the other answer, I include my answer because I spent a lot of time to polish your data).

dat <- read.table(text='USER   ITEM    Rating
USER1     ITEM1 3
USER1     ITEM2   2
USER1     ITEM3   4
USER2     ITEM1   1
USER2     ITEM2   2
USER2     ITEM3   5',header=TRUE)

dat$UserMean <-  ave(dat$Rating,dat$USER)

USER  ITEM Rating UserMean
1 USER1 ITEM1      3 3.000000
2 USER1 ITEM2      2 3.000000
3 USER1 ITEM3      4 3.000000
4 USER2 ITEM1      1 2.666667
5 USER2 ITEM2      2 2.666667
6 USER2 ITEM3      5 2.666667

Another option is to use plyr:

library(plyr)
ddply(dat,.(USER),transform,userMean= mean(Rating))

Upvotes: 2

MrFlick
MrFlick

Reputation: 206566

You are close, you just need the ave() function. Try

Ratings<-data.frame(
   User=rep(1:2, each=3),
   Item=rep(letters[1:3], 2),
   Rating=c(3,2,4,1,2,5)
)

UserMean <- ave(Ratings$Rating, Ratings$User, FUN=mean)

The ave() function will calculate values for each level you specify, and then preserve that value in the original order of your levels. It's basically like tapply in many cases, but it doesn't collapse values. It also can return different values for each level of the factor. For example

ReviewNum <- ave(Ratings$Rating, Ratings$User, FUN=seq_along)

which can track which review number it is for each user.

Upvotes: 2

Related Questions