Reputation: 11
Notice that your graphic constructed from Problem 4 shows a quadratic or curved relationship between log_wages against exp. The next task is to plot three quadratic functions for each race level "black", "white" and "other". To estimate the quadratic fit, you can use the following function quad_fit:
```{r}
quad_fit <- function(data_sub) {
return(lm(log_wage~exp+I(exp^2),data=data_sub)$coefficients)
}
quad_fit(salary_data)
```
The above function computes the least squares quadratic fit and returns coefficients a1, a2, a3, where
Y(hat) = a1 + a2x + a3x^2
where Y(hat) = log(wage) and x = exp
Use ggplot to accomplish this task or use base R graphics for partial credit. Make sure to include a legend and appropriate labels.
My attempt
blackfit <- quad_fit(salary_data[salary_data$race == "black",])
whitefit <- quad_fit(salary_data[salary_data$race == "white",])
otherfit <- quad_fit(salary_data[salary_data$race == "other",])
yblack <- blackfit[1] + blackfit[2]*salary_data$exp + blackfit[3]*(salary_data$exp)^2
ywhite <- whitefit[1] + whitefit[2]*salary_data$exp + whitefit[3]*(salary_data$exp)^2
yother <- otherfit[1] + otherfit[2]*salary_data$exp + otherfit[3]*(salary_data$exp)^2
soloblack <- salary_data[salary_data$race == "black",]
solowhite <- salary_data[salary_data$race == "white",]
soloother <- salary_data[salary_data$race == "other",]
ggplot(data = soloblack) +
geom_point(aes(x = exp, y = log_wage)) +
stat_smooth(aes(y = log_wage, x = exp), formula = y ~ yblack)
This is only the first attempt for the data filtered with for race == "black". I am not clear how the formula should look like because through the quad_fit function it seems it already does the calculations for you.
Upvotes: 1
Views: 212
Reputation: 107737
Consider plotting fitted values using output of quad_fit
(as shown by @StefanK here) and use by
to plot across all distinct values of race:
reg_plot <- function(sub) {
# PREDICTED DATA FOR LINE PLOT
q_fit <- quad_fit(sub)
predicted_df <- data.frame(wage_pred = predict(q_fit, sub), exp = sub$exp)
# ORIGINAL SCATTER PLOT WITH PREDICTED LINE
ggplot(data = sub) +
geom_point(aes(x = exp, y = log_wage, alpha = exp)) +
labs(x = "Job Experience", y = "Log of Wage",
title = paste("Wage and Job Experience Plot for",
sub$race[[1]], "in Salary Dataset")
geom_line(color='red', data = predicted_df, aes(x = exp, y = wage_pred))
}
# RUN GRAPHS FOR EACH RACE
by(salary_data, salary_data$race, reg_plot)
Upvotes: 1