Reputation: 522
How would i extract extract residual data for a specific baseball team in the following linear model? For example, how would I extract the residuals for "CLE"?
library(Lahman)
library(dplyr)
library(broom)
# create baseball team data
data(Teams)
teams <- Teams
teams <- teams %>% mutate(win_percentage = (W / (W + L)) * 100)
# summarize baseball team salary by year
salaries <- Salaries
salaries <- salaries %>%
group_by(teamID, yearID, lgID) %>%
summarise(payroll_M = sum(as.numeric(salary)) / 10^6) %>%
ungroup()
# add winning percentage to the salary table
salaries <- teams %>%
select(yearID, teamID, win_percentage) %>%
right_join(salaries, by = c("yearID", "teamID"))
# compute linear model of winning vs team salary
model <- salaries %>%
group_by(yearID) %>%
do(fit = augment(lm(win_percentage ~ payroll_M, data = .)))
# extract residuals for Cleveland ??????
Upvotes: 1
Views: 1836
Reputation: 78610
You're close, but need two changes to the augment
line.
You're saving the resulting (augmented) data frame to a column called fit
. Instead, try giving it directly to do
(remove the fit =
).
The augment function needs to keep the teamID
column as part of the resulting data, even though it's not in the model. Note that augment
takes a second argument data
for exactly this purpose (see help(augment.lm)
for more).
Thus, the new line would look like:
do(augment(lm(win_percentage ~ payroll_M, data = .), data = .))
The resulting data frame will have one row per original observation, and will include the teamID
along with the residuals and fitted values (which allows you to filter for CLE
).
Upvotes: 4