psysky
psysky

Reputation: 3195

How to add percentages to the calculation in R

Here is an example of my dataset:

disk=structure(list(competitor_id = c(2L, 2L, 2L, 2L, 2L, 2L, 8L, 
8L, 8L, 8L, 8L, 8L), comp_article_id = c("7976DD6D-38BD-D902-F7A8-A015B37A7128", 
"7976DD6D-38BD-D902-F7A8-A015B37A7128", "7976DD6D-38BD-D902-F7A8-A015B37A7128", 
"5683D414-DED2-1B4C-FD03-200B56319104", "5683D414-DED2-1B4C-FD03-200B56319104", 
"5683D414-DED2-1B4C-FD03-200B56319104", "7976DD6D-38BD-D902-F7A8-A015B37A7128", 
"7976DD6D-38BD-D902-F7A8-A015B37A7128", "7976DD6D-38BD-D902-F7A8-A015B37A7128", 
"5683D414-DED2-1B4C-FD03-200B56319104", "5683D414-DED2-1B4C-FD03-200B56319104", 
"5683D414-DED2-1B4C-FD03-200B56319104"), MDM_Key = c(16L, 16L, 
16L, 49L, 49L, 49L, 16L, 16L, 16L, 49L, 49L, 49L), subgroup_id = c("04.01.2001", 
"04.01.2001", "04.01.2001", "04.01.2001", "04.01.2001", "04.01.2001", 
"04.01.2001", "04.01.2001", "04.01.2001", "04.01.2001", "04.01.2001", 
"04.01.2001"), month_id = c(202211L, 202211L, 202212L, 202003L, 
202004L, 202007L, 202211L, 202211L, 202212L, 202003L, 202004L, 
202007L), comp_disc_price = c(1207.06, 1207.06, 1207.06, 720.58, 
774.62, 774.62, 1207.06, 1207.06, 1207.06, 720.58, 774.62, 774.62
), iek_disc_price = c(553.83, 553.83, 537.7, 665.83, 665.83, 
665.83, 553.83, 553.83, 537.7, 665.83, 665.83, 665.83)), class = "data.frame", row.names = c(NA, 
-12L))

I need for the comp_disc_price and iek_disc_price variables to calculate the difference in values by month (month_id). Here is a look at the variables:

month_id    comp_disc_price iek_disc_price
202210             1207,06  553,83
202211              1300    553,83

I need to subtract the smaller month from the larger month:

1300-1207,06=92,4 (comp_disc_price) (202211-202210)
553-553=0 (iek_disc_price)

For this, I used this good solution from the topic before last:

disk %>%
  arrange(competitor_id, month_id) %>%
  group_by(competitor_id,MDM_Key) %>%
  mutate(across(c(comp_disc_price, iek_disc_price), ~ c(diff(.), NA))) %>%
  ungroup()

How can I do that in addition to the absolute values? ​​These were results in percentage terms. For example, the desired output is:

  competitor_id                      comp_article_id MDM_Key
1             2 7976DD6D-38BD-D902-F7A8-A015B37A7128      16
2             2 7976DD6D-38BD-D902-F7A8-A015B37A7128      16
3             2 7976DD6D-38BD-D902-F7A8-A015B37A7128      16
4             2 5683D414-DED2-1B4C-FD03-200B56319104      49
5             2 5683D414-DED2-1B4C-FD03-200B56319104      49
  subgroup_id month_id comp_disc_price iek_disc_price abs_comp_disc
1  04.01.2001   202210         1207.06         553.83         92.94
2  04.01.2001   202211         1300.00         553.83        -50.00
3  04.01.2001   202212         1250.00         537.70            NA
4  04.01.2001   202003          720.58         665.83         54.04
5  04.01.2001   202004          774.62         665.83            NA
  abs_iek_disk perc_comp_disk perc_iek_disk
1         0.00     0.07699700    0.00000000
2       -16.13    -0.03846154   -0.02912446
3           NA             NA            NA
4         0.00     0.00000000    0.00000000

5           NA             NA            NA

How to add percentages to the calculation and keep the original values of the variables comp_disc_price and iek_disc_price?

As always, thanks for your valuable help.

Upvotes: 0

Views: 50

Answers (1)

stefan
stefan

Reputation: 123893

Using dplyr::lag(), adding a function to compute the percentage change, naming your functions and using the .names argument of across you could do:

library(dplyr, warn = FALSE)

disk %>%
  arrange(competitor_id, month_id) %>%
  group_by(competitor_id, MDM_Key) %>%
  mutate(across(c(comp_disc_price, iek_disc_price),
    list(abs = ~ .x - lag(.x), perc = ~ .x / lag(.x) - 1),
    .names = "{.fn}_{.col}"
  )) %>%
  ungroup()
#> # A tibble: 12 × 11
#>    competitor_id comp_…¹ MDM_Key subgr…² month…³ comp_…⁴ iek_d…⁵ abs_c…⁶ perc_…⁷
#>            <int> <chr>     <int> <chr>     <int>   <dbl>   <dbl>   <dbl>   <dbl>
#>  1             2 5683D4…      49 04.01.…  202003    721.    666.    NA   NA     
#>  2             2 5683D4…      49 04.01.…  202004    775.    666.    54.0  0.0750
#>  3             2 5683D4…      49 04.01.…  202007    775.    666.     0    0     
#>  4             2 7976DD…      16 04.01.…  202211   1207.    554.    NA   NA     
#>  5             2 7976DD…      16 04.01.…  202211   1207.    554.     0    0     
#>  6             2 7976DD…      16 04.01.…  202212   1207.    538.     0    0     
#>  7             8 5683D4…      49 04.01.…  202003    721.    666.    NA   NA     
#>  8             8 5683D4…      49 04.01.…  202004    775.    666.    54.0  0.0750
#>  9             8 5683D4…      49 04.01.…  202007    775.    666.     0    0     
#> 10             8 7976DD…      16 04.01.…  202211   1207.    554.    NA   NA     
#> 11             8 7976DD…      16 04.01.…  202211   1207.    554.     0    0     
#> 12             8 7976DD…      16 04.01.…  202212   1207.    538.     0    0     
#> # … with 2 more variables: abs_iek_disc_price <dbl>, perc_iek_disc_price <dbl>,
#> #   and abbreviated variable names ¹​comp_article_id, ²​subgroup_id, ³​month_id,
#> #   ⁴​comp_disc_price, ⁵​iek_disc_price, ⁶​abs_comp_disc_price,
#> #   ⁷​perc_comp_disc_price

Upvotes: 1

Related Questions