Reputation: 554
Building off of this question, I'm trying to create the same dummy variable for the maximum value of each group, but instead of just grouping by the month column, I'd like to group on both month and year so I get the maximum value for both, i.e. a dummy for the max value in 1-2017, 2-2017 , etc.
I've created a demo dataset here as well as some code showing the undesired result when I try to group by two columns.
df<- data.frame(date= seq.Date(from = as.Date('2017-01-01'), by= 14, length.out = 100), var1= rnorm(20, 5, 3))
df$month<- as.numeric(strftime(df$date, "%m"))
df$year <- as.numeric(strftime(df$date, "%Y"))
df <- df %>%
group_by(year, month) %>%
mutate(dummy = as.integer(var1 == max(var1))) %>%
ungroup
When I run this code, this is the output I get.
structure(list(date = structure(c(17167L, 17181L, 17195L, 17209L,
17223L, 17237L, 17251L, 17265L, 17279L, 17293L, 17307L, 17321L,
17335L, 17349L, 17363L, 17377L, 17391L, 17405L, 17419L, 17433L,
17447L, 17461L, 17475L, 17489L, 17503L, 17517L, 17531L, 17545L,
17559L, 17573L, 17587L, 17601L, 17615L, 17629L, 17643L, 17657L,
17671L, 17685L, 17699L, 17713L, 17727L, 17741L, 17755L, 17769L,
17783L, 17797L, 17811L, 17825L, 17839L, 17853L, 17867L, 17881L,
17895L, 17909L, 17923L, 17937L, 17951L, 17965L, 17979L, 17993L,
18007L, 18021L, 18035L, 18049L, 18063L, 18077L, 18091L, 18105L,
18119L, 18133L, 18147L, 18161L, 18175L, 18189L, 18203L, 18217L,
18231L, 18245L, 18259L, 18273L, 18287L, 18301L, 18315L, 18329L,
18343L, 18357L, 18371L, 18385L, 18399L, 18413L, 18427L, 18441L,
18455L, 18469L, 18483L, 18497L, 18511L, 18525L, 18539L, 18553L
), class = "Date"), var1 = c(7.610410421533, 4.39838519593849,
7.29367017299385, 4.65490632032971, 5.98987372646137, 7.89362434824241,
7.37466247923975, 3.55735383387344, 2.04726523393801, 7.85753545892588,
8.9757840439474, 3.03010111198631, 8.12870854721674, 4.94313056382185,
10.3008527572542, 3.38195982516332, 2.38095744572549, 2.33860273928046,
4.55108829246478, 10.3054094468083, 7.610410421533, 4.39838519593849,
7.29367017299385, 4.65490632032971, 5.98987372646137, 7.89362434824241,
7.37466247923975, 3.55735383387344, 2.04726523393801, 7.85753545892588,
8.9757840439474, 3.03010111198631, 8.12870854721674, 4.94313056382185,
10.3008527572542, 3.38195982516332, 2.38095744572549, 2.33860273928046,
4.55108829246478, 10.3054094468083, 7.610410421533, 4.39838519593849,
7.29367017299385, 4.65490632032971, 5.98987372646137, 7.89362434824241,
7.37466247923975, 3.55735383387344, 2.04726523393801, 7.85753545892588,
8.9757840439474, 3.03010111198631, 8.12870854721674, 4.94313056382185,
10.3008527572542, 3.38195982516332, 2.38095744572549, 2.33860273928046,
4.55108829246478, 10.3054094468083, 7.610410421533, 4.39838519593849,
7.29367017299385, 4.65490632032971, 5.98987372646137, 7.89362434824241,
7.37466247923975, 3.55735383387344, 2.04726523393801, 7.85753545892588,
8.9757840439474, 3.03010111198631, 8.12870854721674, 4.94313056382185,
10.3008527572542, 3.38195982516332, 2.38095744572549, 2.33860273928046,
4.55108829246478, 10.3054094468083, 7.610410421533, 4.39838519593849,
7.29367017299385, 4.65490632032971, 5.98987372646137, 7.89362434824241,
7.37466247923975, 3.55735383387344, 2.04726523393801, 7.85753545892588,
8.9757840439474, 3.03010111198631, 8.12870854721674, 4.94313056382185,
10.3008527572542, 3.38195982516332, 2.38095744572549, 2.33860273928046,
4.55108829246478, 10.3054094468083), month = c(1, 1, 1, 2, 2,
3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11,
12, 12, 12, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 7, 8, 8,
9, 9, 10, 10, 11, 11, 12, 12, 12, 1, 1, 2, 2, 3, 3, 4, 4, 5,
5, 6, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 12, 1,
1, 2, 2, 3, 3, 4, 4, 5, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10),
year = c(2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017,
2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017,
2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2018,
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018,
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018,
2018, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019,
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019,
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019,
2019, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020,
2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020,
2020, 2020), dummy = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L)), row.names = c(NA, -100L), class = c("tbl_df", "tbl",
"data.frame"))
Upvotes: 0
Views: 547
Reputation: 46898
If I take the data.frame output you provided, and I do the same mutate call, I get a totally different output:
df %>% group_by(month,year) %>% mutate(ismax=as.integer(var1==max(var1)))
# A tibble: 100 x 6
# Groups: month, year [46]
date var1 month year dummy ismax
<date> <dbl> <dbl> <dbl> <int> <int>
1 2017-01-01 7.61 1 2017 0 1
2 2017-01-15 4.40 1 2017 0 0
3 2017-01-29 7.29 1 2017 0 0
4 2017-02-12 4.65 2 2017 0 0
5 2017-02-26 5.99 2 2017 0 1
6 2017-03-12 7.89 3 2017 0 1
7 2017-03-26 7.37 3 2017 0 0
8 2017-04-09 3.56 4 2017 0 1
9 2017-04-23 2.05 4 2017 0 0
10 2017-05-07 7.86 5 2017 0 0
These are my system settings, is the dplyr version too old or did you load another package that clash?
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.8.3
loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 fansi_0.4.0 zeallot_0.1.0 utf8_1.1.4
[5] crayon_1.3.4 assertthat_0.2.1 R6_2.4.0 backports_1.1.5
[9] magrittr_1.5 pillar_1.4.2 cli_1.1.0 rlang_0.4.1
[13] vctrs_0.2.0 glue_1.3.1 purrr_0.3.3 compiler_3.6.1
[17] pkgconfig_2.0.3 tidyselect_0.2.5 tibble_2.1.3
Otherwise you can try this way:
myfunc = function(x)as.integer(x$var1==max(x$var1))
df %>% group_by(year,month) %>% do(data.frame(. , ismax = myfunc(.)))
# A tibble: 100 x 6
# Groups: year, month [46]
date var1 month year dummy ismax
<date> <dbl> <dbl> <dbl> <int> <int>
1 2017-01-01 7.61 1 2017 0 1
2 2017-01-15 4.40 1 2017 0 0
3 2017-01-29 7.29 1 2017 0 0
4 2017-02-12 4.65 2 2017 0 0
5 2017-02-26 5.99 2 2017 0 1
6 2017-03-12 7.89 3 2017 0 1
7 2017-03-26 7.37 3 2017 0 0
8 2017-04-09 3.56 4 2017 0 1
9 2017-04-23 2.05 4 2017 0 0
10 2017-05-07 7.86 5 2017 0 0
# … with 90 more rows
Upvotes: 1