Reputation: 3239
Background: In my problem, I have a data set where each row represents a particular dive, for a particular event, for a particular diver. Each with its own score, and judge.
What I want to get is the average score that a particular judge gave to each diver by country.
The Data: To illustrate what I am saying:
Event Round Diver Country Rank DiveNo Difficulty JScore Judge JCountry
1 M3mSB Final XIONG Ni CHN 1 1 3.1 8.0 RUIZ-PEDREGUERA Rolando CUB
2 M3mSB Final XIONG Ni CHN 1 1 3.1 9.0 GEAR Dennis NZL
3 M3mSB Final XIONG Ni CHN 1 1 3.1 8.5 BOYS Beverley CAN
4 M3mSB Final XIONG Ni CHN 1 1 3.1 8.5 JOHNSON Bente NOR
5 M3mSB Final XIONG Ni CHN 1 1 3.1 8.5 BOUSSARD Michel FRA
6 M3mSB Final XIONG Ni CHN 1 1 3.1 8.5 CALDERON Felix PUR
What I tried: which seems to work:
countries <- unique(x$Country[x$Judge==thisjudge])
AvgByCountry <- vector(mode = "numeric", length =length(countries))
for(i in 1:length(countries)){
AvgByCountry[i] <- mean(x$JScore[x$Country[x$Judge==thisjudge]==countries[i]])
}
names(AvgByCountry) <- countries
AvgByCountry
The Question: I understand it may not be the best loop, but is there a better way to do this? I tried sub-setting, and a few other things, but none of which gave me what I wanted.
Upvotes: 0
Views: 74
Reputation: 2313
Using data.tabe
:
library(data.table)
set.seed(100)
DT <- data.table(X = rnorm(20), "Country" = sample(c("US","UK"), 10, TRUE))
X Country
1: -0.50219235 US
2: 0.13153117 UK
3: -0.07891709 UK
4: 0.88678481 UK
5: 0.11697127 UK
6: 0.31863009 US
7: -0.58179068 UK
8: 0.71453271 UK
9: -0.82525943 US
10: -0.35986213 US
11: 0.08988614 US
12: 0.09627446 UK
13: -0.20163395 UK
14: 0.73984050 UK
15: 0.12337950 UK
16: -0.02931671 US
17: -0.38885425 UK
18: 0.51085626 UK
19: -0.91381419 US
20: 2.31029682 US
DT[, Mean:=mean(X), by= 'Country']
X Country Mean
1: -0.50219235 US 0.01104603
2: 0.13153117 UK 0.17241456
3: -0.07891709 UK 0.17241456
4: 0.88678481 UK 0.17241456
5: 0.11697127 UK 0.17241456
6: 0.31863009 US 0.01104603
7: -0.58179068 UK 0.17241456
8: 0.71453271 UK 0.17241456
9: -0.82525943 US 0.01104603
10: -0.35986213 US 0.01104603
11: 0.08988614 US 0.01104603
12: 0.09627446 UK 0.17241456
13: -0.20163395 UK 0.17241456
14: 0.73984050 UK 0.17241456
15: 0.12337950 UK 0.17241456
16: -0.02931671 US 0.01104603
17: -0.38885425 UK 0.17241456
18: 0.51085626 UK 0.17241456
19: -0.91381419 US 0.01104603
20: 2.31029682 US 0.01104603
Or as warmoverflow suggested using aggregate
aggregate(X ~ Country, data = DT, mean)
Country X
1 UK 0.17241456
2 US 0.01104603
EDIT as Update to Comment:
library(data.table)
set.seed(100)
DT <- data.table(X = rnorm(20), "Country" = sample(c("US","UK"), 10, TRUE), "Judge" = sample(c("James","Nick"), 10, TRUE)
aggregate(X ~ Country + Judge, data = DT, mean)
Country Judge X
1 UK James 0.1828624
2 US James 0.3045736
3 UK Nick 0.1201754
4 US Nick -0.8695368
Or data.table
method
DT[, Mean:=mean(X), by= c('Country', 'Judge')]
Upvotes: 2