ModalBro
ModalBro

Reputation: 554

Coding dummy variable based on max value of two groups in R?

Building off of this question, I'm trying to create the same dummy variable for the maximum value of each group, but instead of just grouping by the month column, I'd like to group on both month and year so I get the maximum value for both, i.e. a dummy for the max value in 1-2017, 2-2017 , etc.

I've created a demo dataset here as well as some code showing the undesired result when I try to group by two columns.

df<- data.frame(date= seq.Date(from = as.Date('2017-01-01'), by= 14, length.out = 100), var1= rnorm(20, 5, 3))
df$month<- as.numeric(strftime(df$date, "%m"))
df$year <- as.numeric(strftime(df$date, "%Y"))

df <- df %>%
    group_by(year, month) %>%
    mutate(dummy = as.integer(var1 == max(var1))) %>%
    ungroup

When I run this code, this is the output I get.

structure(list(date = structure(c(17167L, 17181L, 17195L, 17209L, 
17223L, 17237L, 17251L, 17265L, 17279L, 17293L, 17307L, 17321L, 
17335L, 17349L, 17363L, 17377L, 17391L, 17405L, 17419L, 17433L, 
17447L, 17461L, 17475L, 17489L, 17503L, 17517L, 17531L, 17545L, 
17559L, 17573L, 17587L, 17601L, 17615L, 17629L, 17643L, 17657L, 
17671L, 17685L, 17699L, 17713L, 17727L, 17741L, 17755L, 17769L, 
17783L, 17797L, 17811L, 17825L, 17839L, 17853L, 17867L, 17881L, 
17895L, 17909L, 17923L, 17937L, 17951L, 17965L, 17979L, 17993L, 
18007L, 18021L, 18035L, 18049L, 18063L, 18077L, 18091L, 18105L, 
18119L, 18133L, 18147L, 18161L, 18175L, 18189L, 18203L, 18217L, 
18231L, 18245L, 18259L, 18273L, 18287L, 18301L, 18315L, 18329L, 
18343L, 18357L, 18371L, 18385L, 18399L, 18413L, 18427L, 18441L, 
18455L, 18469L, 18483L, 18497L, 18511L, 18525L, 18539L, 18553L
), class = "Date"), var1 = c(7.610410421533, 4.39838519593849, 
7.29367017299385, 4.65490632032971, 5.98987372646137, 7.89362434824241, 
7.37466247923975, 3.55735383387344, 2.04726523393801, 7.85753545892588, 
8.9757840439474, 3.03010111198631, 8.12870854721674, 4.94313056382185, 
10.3008527572542, 3.38195982516332, 2.38095744572549, 2.33860273928046, 
4.55108829246478, 10.3054094468083, 7.610410421533, 4.39838519593849, 
7.29367017299385, 4.65490632032971, 5.98987372646137, 7.89362434824241, 
7.37466247923975, 3.55735383387344, 2.04726523393801, 7.85753545892588, 
8.9757840439474, 3.03010111198631, 8.12870854721674, 4.94313056382185, 
10.3008527572542, 3.38195982516332, 2.38095744572549, 2.33860273928046, 
4.55108829246478, 10.3054094468083, 7.610410421533, 4.39838519593849, 
7.29367017299385, 4.65490632032971, 5.98987372646137, 7.89362434824241, 
7.37466247923975, 3.55735383387344, 2.04726523393801, 7.85753545892588, 
8.9757840439474, 3.03010111198631, 8.12870854721674, 4.94313056382185, 
10.3008527572542, 3.38195982516332, 2.38095744572549, 2.33860273928046, 
4.55108829246478, 10.3054094468083, 7.610410421533, 4.39838519593849, 
7.29367017299385, 4.65490632032971, 5.98987372646137, 7.89362434824241, 
7.37466247923975, 3.55735383387344, 2.04726523393801, 7.85753545892588, 
8.9757840439474, 3.03010111198631, 8.12870854721674, 4.94313056382185, 
10.3008527572542, 3.38195982516332, 2.38095744572549, 2.33860273928046, 
4.55108829246478, 10.3054094468083, 7.610410421533, 4.39838519593849, 
7.29367017299385, 4.65490632032971, 5.98987372646137, 7.89362434824241, 
7.37466247923975, 3.55735383387344, 2.04726523393801, 7.85753545892588, 
8.9757840439474, 3.03010111198631, 8.12870854721674, 4.94313056382185, 
10.3008527572542, 3.38195982516332, 2.38095744572549, 2.33860273928046, 
4.55108829246478, 10.3054094468083), month = c(1, 1, 1, 2, 2, 
3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 
12, 12, 12, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 7, 8, 8, 
9, 9, 10, 10, 11, 11, 12, 12, 12, 1, 1, 2, 2, 3, 3, 4, 4, 5, 
5, 6, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 12, 1, 
1, 2, 2, 3, 3, 4, 4, 5, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10), 
    year = c(2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 
    2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 
    2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2018, 
    2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 
    2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 
    2018, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019, 
    2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 
    2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 
    2019, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 
    2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 
    2020, 2020), dummy = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    1L)), row.names = c(NA, -100L), class = c("tbl_df", "tbl", 
"data.frame"))

Upvotes: 0

Views: 547

Answers (1)

StupidWolf
StupidWolf

Reputation: 46898

If I take the data.frame output you provided, and I do the same mutate call, I get a totally different output:

df %>% group_by(month,year) %>% mutate(ismax=as.integer(var1==max(var1)))
# A tibble: 100 x 6
# Groups:   month, year [46]
   date        var1 month  year dummy ismax
   <date>     <dbl> <dbl> <dbl> <int> <int>
 1 2017-01-01  7.61     1  2017     0     1
 2 2017-01-15  4.40     1  2017     0     0
 3 2017-01-29  7.29     1  2017     0     0
 4 2017-02-12  4.65     2  2017     0     0
 5 2017-02-26  5.99     2  2017     0     1
 6 2017-03-12  7.89     3  2017     0     1
 7 2017-03-26  7.37     3  2017     0     0
 8 2017-04-09  3.56     4  2017     0     1
 9 2017-04-23  2.05     4  2017     0     0
10 2017-05-07  7.86     5  2017     0     0

These are my system settings, is the dplyr version too old or did you load another package that clash?

R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.8.3

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2       fansi_0.4.0      zeallot_0.1.0    utf8_1.1.4      
 [5] crayon_1.3.4     assertthat_0.2.1 R6_2.4.0         backports_1.1.5 
 [9] magrittr_1.5     pillar_1.4.2     cli_1.1.0        rlang_0.4.1     
[13] vctrs_0.2.0      glue_1.3.1       purrr_0.3.3      compiler_3.6.1  
[17] pkgconfig_2.0.3  tidyselect_0.2.5 tibble_2.1.3  

Otherwise you can try this way:

myfunc = function(x)as.integer(x$var1==max(x$var1))
df %>% group_by(year,month) %>% do(data.frame(. , ismax = myfunc(.)))
# A tibble: 100 x 6
# Groups:   year, month [46]
   date        var1 month  year dummy ismax
   <date>     <dbl> <dbl> <dbl> <int> <int>
 1 2017-01-01  7.61     1  2017     0     1
 2 2017-01-15  4.40     1  2017     0     0
 3 2017-01-29  7.29     1  2017     0     0
 4 2017-02-12  4.65     2  2017     0     0
 5 2017-02-26  5.99     2  2017     0     1
 6 2017-03-12  7.89     3  2017     0     1
 7 2017-03-26  7.37     3  2017     0     0
 8 2017-04-09  3.56     4  2017     0     1
 9 2017-04-23  2.05     4  2017     0     0
10 2017-05-07  7.86     5  2017     0     0
# … with 90 more rows

Upvotes: 1

Related Questions