Reputation: 137
I have a long data set where each row is a state's income tax rate for that year. Here are a few rows as an example:
State statefip year TopRate
<chr> <dbl> <fct> <dbl>
1 Alabama 1 2018 0.05
2 Alabama 1 2017 0.05
3 Alabama 1 2016 0.05
4 Alabama 1 2015 0.05
5 Alabama 1 2014 0.05
6 Alabama 1 2013 0.05
7 Alabama 1 2012 0.05
8 Alabama 1 2011 0.05
9 Alabama 1 2010 0.05
10 Alaska 2 2018 0
I would like to create an additional column that is a quantile (.5, .75, .80.. and so on) for each year. So, the .5 (median) column would have the median tax rate for all states in the year 2018 and so on for each year. To clarify, every observation for a given state in the year 2018 would have the same tax rate for the .5 but they would change for each year. While this may seem weird, it will help me when I have to merge this data with other data I have later. Thanks in advance for any help!
Edit: here's what I want..
State statefip year TopRate median q3 q80 q85 q90 jenkb$jenkb
<chr> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 Alabama 1 2018 0.05 0.0575 0.0700 0.074 0.0888 0.0897 0
2 Alabama 1 2017 0.05 0.0575 0.0700 0.074 0.0854 0.0895 0
3 Alabama 1 2016 0.05 0.0575 0.0700 0.074 0.0854 0.0895 0
4 Alabama 1 2015 0.05 0.0575 0.0700 0.074 0.0888 0.0897 0
5 Alabama 1 2014 0.05 0.058 0.07 0.0765 0.0888 0.0897 0
6 Alabama 1 2013 0.05 0.0599 0.072 0.0775 0.0839 0.0895 0
7 Alabama 1 2012 0.05 0.06 0.072 0.0775 0.0866 0.0895 0
8 Alabama 1 2011 0.05 0.06 0.0738 0.078 0.085 0.0897 0
9 Alabama 1 2010 0.05 0.06 0.0738 0.078 0.0872 0.0897 1
10 Alaska 2 2018 0 0.0575 0.0700 0.074 0.0888 0.0897 0
# ... with 449 more rows
Except I want the JenkB column to be a 1 for every year the state fits into the second bucket for the Jenks break for that given year.
taxlong %>%
group_by(year) %>%
mutate(median = quantile(TopRate, .5)) %>%
mutate(q3 = quantile(TopRate, .75)) %>%
mutate(q80 = quantile(TopRate, .80)) %>%
mutate(q85 = quantile(TopRate, .85)) %>%
mutate(q90 = quantile(TopRate, .90)) #%>%
#mutate(jenkb = as.integer((TopRate > 0.0323 & year == 2018 | 2017 | 2015) | (TopRate > 0.0375 & year == 2016) | (TopRate > 0.034 & year == 2014 | 2013 | 2012 | 2011 | 2010)))
This code didn't work
Upvotes: 1
Views: 71
Reputation: 887173
We can use quantile
with cut
to create a column after grouping by the group columns 'year'
library(dplyr)
df1 %>%
group_by( year) %>%
mutate(new = cut(TopRate, breaks = c(-Inf, quantile(TopRate,
probs = c(.5, .75, .8) ), Inf)))
If we wanted to create new columns with quantile
values (values will be repeated for each row), place it in a list
and then unnest
it to 'wide' format
library(tidyr)
df1 %>%
group_by(year) %>%
mutate(new = list(quantile(TopRate, probs = c(.5, .75, .8)))) %>%
unnest_wider(c(new))
Or with data.table
library(data.table)
setDT(df1)[, new := cut(TopRate, breaks = c(-Inf, quantile(TopRate, probs = c(.5, .75, .8)), Inf)), .( year)]
Upvotes: 3
Reputation: 137
thank you for your help.
taxlong %>%
group_by(year) %>%
mutate(median = quantile(TopRate, .5))
works perfectly, I'll just repeat the process for each of the quantiles I need. I need to keep the values for my later analysis.
Upvotes: 1