Reputation: 8009
I want to plot fee as a percentage of income fee_per_inc
for each income year_hh_inc
quintile.
this is what I have so far:
pacman::p_load(RCurl, plm, tibble, ggplot2, AER, dplyr, car, arm, broom, tidyr, fastDummies, dummies)
x <- getURL("https://raw.githubusercontent.com/dothemathonthatone/maps/master/main_test.csv")
maindf <- read.csv(text = x, row.names=NULL)
maindf <- maindf %>%
mutate(category = cut(year_hh_inc, breaks = (quantile(year_hh_inc, c(0, 1 / 5, 2 / 5, 3 / 5, 4 / 5, 1), na.rm = TRUE)), labels = c("first_quint", "second_quint", "third_quint", 'fourth_quint', 'fifth_quint'), include.lowest = TRUE), vals = 1) %>%
pivot_wider(names_from = category, values_from = vals, values_fill = list(vals = 0))
box <- boxplot(maindf$year_hh_inc ~ maindf$fee_per_inc, col = 3:5)
This is what I would like as an end result:
I think I have a bit more work to do; any help from this point is appreciated.
Upvotes: 1
Views: 167
Reputation: 173793
I think there were a couple of problems here. You need the boxplot to have the variables the other way round. Also, you need to use the category
variable that you created in mutate instead of the original variable. Lastly, you don't need the pivot_wider
.
Some of the values were also way outside the useful range and may have been wrong (some numbers were -8), so I have trimmed the outliers to make the graph prettier. You'll want to check the original data to see whether this makes sense.
pacman::p_load(RCurl, plm, tibble, ggplot2, AER, dplyr, car, arm, broom, tidyr, fastDummies, dummies)
x <- getURL("https://raw.githubusercontent.com/dothemathonthatone/maps/master/main_test.csv")
maindf <- read.csv(text = x, row.names=NULL)
maindf <- maindf %>%
mutate(category = cut(year_hh_inc,
breaks = (quantile(year_hh_inc, c(0, 1/5, 2/5, 3/5, 4/5, 1), na.rm = TRUE)),
labels = c("first_quint", "second_quint", "third_quint",
'fourth_quint', 'fifth_quint'),
include.lowest = TRUE),
vals = 1)
maindf <- maindf[maindf$fee_per_inc > 0 & maindf$fee_per_inc < 0.01, ]
box <- boxplot(maindf$fee_per_inc ~ maindf$category, col = 3:5)
Created on 2020-03-03 by the reprex package (v0.3.0)
Upvotes: 1