Applying Functions to Filtered Rows in R

Question

I have a long data set where each row is a state's income tax rate for that year. Here are a few rows as an example:

State   statefip year  TopRate
            
1 Alabama        1 2018     0.05
2 Alabama        1 2017     0.05
3 Alabama        1 2016     0.05
4 Alabama        1 2015     0.05
5 Alabama        1 2014     0.05
6 Alabama        1 2013     0.05
7 Alabama        1 2012     0.05
8 Alabama        1 2011     0.05
9 Alabama        1 2010     0.05
10 Alaska         2 2018     0

I would like to create an additional column that is a quantile (.5, .75, .80.. and so on) for each year. So, the .5 (median) column would have the median tax rate for all states in the year 2018 and so on for each year. To clarify, every observation for a given state in the year 2018 would have the same tax rate for the .5 but they would change for each year. While this may seem weird, it will help me when I have to merge this data with other data I have later. Thanks in advance for any help!

Edit: here's what I want..

State   statefip year  TopRate median     q3    q80    q85    q90 jenkb$jenkb
                              
 1 Alabama        1 2018     0.05 0.0575 0.0700 0.074  0.0888 0.0897           0
 2 Alabama        1 2017     0.05 0.0575 0.0700 0.074  0.0854 0.0895           0
 3 Alabama        1 2016     0.05 0.0575 0.0700 0.074  0.0854 0.0895           0
 4 Alabama        1 2015     0.05 0.0575 0.0700 0.074  0.0888 0.0897           0
 5 Alabama        1 2014     0.05 0.058  0.07   0.0765 0.0888 0.0897           0
 6 Alabama        1 2013     0.05 0.0599 0.072  0.0775 0.0839 0.0895           0
 7 Alabama        1 2012     0.05 0.06   0.072  0.0775 0.0866 0.0895           0
 8 Alabama        1 2011     0.05 0.06   0.0738 0.078  0.085  0.0897           0
 9 Alabama        1 2010     0.05 0.06   0.0738 0.078  0.0872 0.0897           1
10 Alaska         2 2018     0    0.0575 0.0700 0.074  0.0888 0.0897           0
# ... with 449 more rows

Except I want the JenkB column to be a 1 for every year the state fits into the second bucket for the Jenks break for that given year.

taxlong %>%
  group_by(year) %>% 
  mutate(median = quantile(TopRate, .5)) %>%
  mutate(q3 = quantile(TopRate, .75)) %>%
  mutate(q80 = quantile(TopRate, .80)) %>%
  mutate(q85 = quantile(TopRate, .85)) %>%
  mutate(q90 = quantile(TopRate, .90)) #%>%
  #mutate(jenkb = as.integer((TopRate > 0.0323 & year == 2018 | 2017 | 2015) | (TopRate > 0.0375 & year == 2016) | (TopRate > 0.034 & year == 2014 | 2013 | 2012 | 2011 | 2010)))

This code didn't work

akrun · Accepted Answer

We can use quantile with cut to create a column after grouping by the group columns 'year'

library(dplyr)
df1 %>%
    group_by( year) %>% 
    mutate(new = cut(TopRate, breaks = c(-Inf, quantile(TopRate, 
            probs = c(.5, .75, .8) ), Inf)))

If we wanted to create new columns with quantile values (values will be repeated for each row), place it in a list and then unnest it to 'wide' format

library(tidyr)
df1 %>% 
   group_by(year) %>%
   mutate(new = list(quantile(TopRate, probs = c(.5, .75, .8)))) %>% 
   unnest_wider(c(new))

Or with data.table

library(data.table)
setDT(df1)[, new := cut(TopRate, breaks = c(-Inf, quantile(TopRate, probs = c(.5, .75, .8)), Inf)), .( year)]

Applying Functions to Filtered Rows in R

Answers (2)

Related Questions