Reputation: 1
I have produced code on R which has predicted results of a football league season of 92 teams
mod3=glm(formula = Score ~ as.factor(Attack) + as.factor(Defence) + as.factor(Home), family = poisson, data = football)
for (i in 1:92){
for (j in 1:92){
if (i!=j){
teamHome=levels(football$Attack)[i]
teamAway=levels(football$Attack)[j]
homeScore=rpois(1,predict.glm(mod3, data.frame(Attack=teamHome,Defence=teamAway,Home="Y "),type="response"))
awayScore=rpois(1,predict.glm(mod3, data.frame(Attack=teamAway,Defence=teamHome,Home="N "),type="response"))
Result= if(homeScore>awayScore){
Result="H"
} else if(homeScore<awayScore){
Result="A"
} else if(homeScore==awayScore){
Result="D"
}
Results<-print(paste(teamHome,homeScore," ",teamAway,awayScore,Result),quote=F)
}
}
}
This produces a list of 8000 0r so matches that I wanted.
However when I do
teamHome
[1] "Aldershot "
I only get the first team in my output and when I do
levels(teamHome)
NULL
this is the same for all my variables and is making it difficult to format the results as a 'league table'
Is there any mistake in my code that means I am not getting the full list of "teamHome" or is there a method to access this.
I hope I explained this problem correctly
Thanks
Stephen
Upvotes: 0
Views: 1245
Reputation: 27408
Here's a simpler approach to simulating the scores, which takes advantage of the fact that we can predict to multiple new combinations of covariates at once.
First, let's simulate some data to fit the original model:
set.seed(1)
n <- 100000
att <- sample(LETTERS, n, TRUE)
def <- sapply(att, function(x) sample(LETTERS[-grep(x, LETTERS)], 1))
X <- data.frame(att, def, home=factor(sample(0:1, n, TRUE)))
mm <- model.matrix(~ ., data=X)
b <- rnorm(ncol(mm), sd=0.1)
mu <- exp(mm %*% b)
y <- rpois(length(mu), mu)
dat <- cbind(y, X)
head(dat)
y att def home
1 1 G S 1
2 1 J S 1
3 1 O H 1
4 1 X N 1
5 1 F W 0
6 2 X R 1
And fit the model:
mod <- glm(y ~ ., data=dat, family='poisson')
Comparison of b
and coef(mod)
indicates that the model estimates the true coefficients relatively accurately (although we needed a large sample size to achieve this, given the many factor levels - and therefore many coefficients - that we are estimating).
Now we can predict the fitted model to some new data. We can use expand.grid
to return all combinations of an arbitrary number of factors. This is useful if we want to predict to all combinations of attacking team, defending team, and "home".
newdat <- setNames(expand.grid(levels(dat$att), levels(dat$def), factor(0:1)),
c('att', 'def', 'home'))
# now reduce newdat to exclude rows where att == def
newdat <- subset(newdat, att!=def)
sim.score <- rpois(nrow(newdat), predict(mod, newdat, type='response'))
results <- cbind(newdat, score=sim.score)
head(results)
att def home score
2 B A 0 1
3 C A 0 0
4 D A 0 2
5 E A 0 1
6 F A 0 2
7 G A 0 0
Upvotes: 2