Collective Action
Collective Action

Reputation: 8009

How to use box plot with column range

I want to plot fee as a percentage of income fee_per_inc for each income year_hh_inc quintile.

this is what I have so far:

pacman::p_load(RCurl, plm, tibble, ggplot2, AER, dplyr, car, arm, broom, tidyr, fastDummies, dummies)


x <- getURL("https://raw.githubusercontent.com/dothemathonthatone/maps/master/main_test.csv")
    maindf <- read.csv(text = x, row.names=NULL)

maindf <- maindf %>% 
 mutate(category = cut(year_hh_inc, breaks = (quantile(year_hh_inc, c(0, 1 / 5, 2 / 5, 3 / 5, 4 / 5, 1), na.rm = TRUE)), labels = c("first_quint", "second_quint", "third_quint", 'fourth_quint', 'fifth_quint'), include.lowest = TRUE), vals = 1) %>% 
 pivot_wider(names_from = category, values_from = vals, values_fill = list(vals = 0))


box  <- boxplot(maindf$year_hh_inc ~ maindf$fee_per_inc, col = 3:5)

This is what I would like as an end result:

Desired End Result

I think I have a bit more work to do; any help from this point is appreciated.

Upvotes: 1

Views: 167

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 173793

I think there were a couple of problems here. You need the boxplot to have the variables the other way round. Also, you need to use the category variable that you created in mutate instead of the original variable. Lastly, you don't need the pivot_wider.

Some of the values were also way outside the useful range and may have been wrong (some numbers were -8), so I have trimmed the outliers to make the graph prettier. You'll want to check the original data to see whether this makes sense.

pacman::p_load(RCurl, plm, tibble, ggplot2, AER, dplyr, car, arm, broom, tidyr, fastDummies, dummies)

x <- getURL("https://raw.githubusercontent.com/dothemathonthatone/maps/master/main_test.csv")
    maindf <- read.csv(text = x, row.names=NULL)

maindf <- maindf %>% 
 mutate(category = cut(year_hh_inc, 
                       breaks = (quantile(year_hh_inc, c(0, 1/5, 2/5, 3/5, 4/5, 1), na.rm = TRUE)), 
                       labels = c("first_quint", "second_quint", "third_quint",
                                  'fourth_quint', 'fifth_quint'), 
                       include.lowest = TRUE), 
        vals = 1) 

maindf <- maindf[maindf$fee_per_inc > 0 & maindf$fee_per_inc < 0.01, ]

box  <- boxplot(maindf$fee_per_inc ~ maindf$category, col = 3:5)

Created on 2020-03-03 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions