Reputation: 3195
Here is an example of my dataset:
disk=structure(list(competitor_id = c(2L, 2L, 2L, 2L, 2L, 2L, 8L,
8L, 8L, 8L, 8L, 8L), comp_article_id = c("7976DD6D-38BD-D902-F7A8-A015B37A7128",
"7976DD6D-38BD-D902-F7A8-A015B37A7128", "7976DD6D-38BD-D902-F7A8-A015B37A7128",
"5683D414-DED2-1B4C-FD03-200B56319104", "5683D414-DED2-1B4C-FD03-200B56319104",
"5683D414-DED2-1B4C-FD03-200B56319104", "7976DD6D-38BD-D902-F7A8-A015B37A7128",
"7976DD6D-38BD-D902-F7A8-A015B37A7128", "7976DD6D-38BD-D902-F7A8-A015B37A7128",
"5683D414-DED2-1B4C-FD03-200B56319104", "5683D414-DED2-1B4C-FD03-200B56319104",
"5683D414-DED2-1B4C-FD03-200B56319104"), MDM_Key = c(16L, 16L,
16L, 49L, 49L, 49L, 16L, 16L, 16L, 49L, 49L, 49L), subgroup_id = c("04.01.2001",
"04.01.2001", "04.01.2001", "04.01.2001", "04.01.2001", "04.01.2001",
"04.01.2001", "04.01.2001", "04.01.2001", "04.01.2001", "04.01.2001",
"04.01.2001"), month_id = c(202211L, 202211L, 202212L, 202003L,
202004L, 202007L, 202211L, 202211L, 202212L, 202003L, 202004L,
202007L), comp_disc_price = c(1207.06, 1207.06, 1207.06, 720.58,
774.62, 774.62, 1207.06, 1207.06, 1207.06, 720.58, 774.62, 774.62
), iek_disc_price = c(553.83, 553.83, 537.7, 665.83, 665.83,
665.83, 553.83, 553.83, 537.7, 665.83, 665.83, 665.83)), class = "data.frame", row.names = c(NA,
-12L))
I need for the comp_disc_price
and iek_disc_price
variables to calculate the difference in values by month (month_id). Here is a look at the variables:
month_id comp_disc_price iek_disc_price
202210 1207,06 553,83
202211 1300 553,83
I need to subtract the smaller month from the larger month:
1300-1207,06=92,4 (comp_disc_price) (202211-202210)
553-553=0 (iek_disc_price)
For this, I used this good solution from the topic before last:
disk %>%
arrange(competitor_id, month_id) %>%
group_by(competitor_id,MDM_Key) %>%
mutate(across(c(comp_disc_price, iek_disc_price), ~ c(diff(.), NA))) %>%
ungroup()
How can I do that in addition to the absolute values? These were results in percentage terms. For example, the desired output is:
competitor_id comp_article_id MDM_Key
1 2 7976DD6D-38BD-D902-F7A8-A015B37A7128 16
2 2 7976DD6D-38BD-D902-F7A8-A015B37A7128 16
3 2 7976DD6D-38BD-D902-F7A8-A015B37A7128 16
4 2 5683D414-DED2-1B4C-FD03-200B56319104 49
5 2 5683D414-DED2-1B4C-FD03-200B56319104 49
subgroup_id month_id comp_disc_price iek_disc_price abs_comp_disc
1 04.01.2001 202210 1207.06 553.83 92.94
2 04.01.2001 202211 1300.00 553.83 -50.00
3 04.01.2001 202212 1250.00 537.70 NA
4 04.01.2001 202003 720.58 665.83 54.04
5 04.01.2001 202004 774.62 665.83 NA
abs_iek_disk perc_comp_disk perc_iek_disk
1 0.00 0.07699700 0.00000000
2 -16.13 -0.03846154 -0.02912446
3 NA NA NA
4 0.00 0.00000000 0.00000000
5 NA NA NA
How to add percentages to the calculation and keep the original values of the variables comp_disc_price
and iek_disc_price
?
As always, thanks for your valuable help.
Upvotes: 0
Views: 50
Reputation: 123893
Using dplyr::lag()
, adding a function to compute the percentage change, naming your functions and using the .names
argument of across
you could do:
library(dplyr, warn = FALSE)
disk %>%
arrange(competitor_id, month_id) %>%
group_by(competitor_id, MDM_Key) %>%
mutate(across(c(comp_disc_price, iek_disc_price),
list(abs = ~ .x - lag(.x), perc = ~ .x / lag(.x) - 1),
.names = "{.fn}_{.col}"
)) %>%
ungroup()
#> # A tibble: 12 × 11
#> competitor_id comp_…¹ MDM_Key subgr…² month…³ comp_…⁴ iek_d…⁵ abs_c…⁶ perc_…⁷
#> <int> <chr> <int> <chr> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2 5683D4… 49 04.01.… 202003 721. 666. NA NA
#> 2 2 5683D4… 49 04.01.… 202004 775. 666. 54.0 0.0750
#> 3 2 5683D4… 49 04.01.… 202007 775. 666. 0 0
#> 4 2 7976DD… 16 04.01.… 202211 1207. 554. NA NA
#> 5 2 7976DD… 16 04.01.… 202211 1207. 554. 0 0
#> 6 2 7976DD… 16 04.01.… 202212 1207. 538. 0 0
#> 7 8 5683D4… 49 04.01.… 202003 721. 666. NA NA
#> 8 8 5683D4… 49 04.01.… 202004 775. 666. 54.0 0.0750
#> 9 8 5683D4… 49 04.01.… 202007 775. 666. 0 0
#> 10 8 7976DD… 16 04.01.… 202211 1207. 554. NA NA
#> 11 8 7976DD… 16 04.01.… 202211 1207. 554. 0 0
#> 12 8 7976DD… 16 04.01.… 202212 1207. 538. 0 0
#> # … with 2 more variables: abs_iek_disc_price <dbl>, perc_iek_disc_price <dbl>,
#> # and abbreviated variable names ¹comp_article_id, ²subgroup_id, ³month_id,
#> # ⁴comp_disc_price, ⁵iek_disc_price, ⁶abs_comp_disc_price,
#> # ⁷perc_comp_disc_price
Upvotes: 1