Reputation: 1
I'm trying to reduce and simplify a dataset from our store order tool. Our tools export each order row as a new order with repeated values. For example, current export is like:
ORDER PRODUCT STORE OPTIONAL PRICE
<chr> <chr> <chr> <chr> <dbl>
1 ord1 p1 s1 Y 3
2 ord2 p2 s2 N 2
3 ord3 p3 s3 Y 3
4 ord3 p2 s3 N 5
5 ord3 p2 s3 Y 6
6 ord4 p3 s2 Y 1
Here the code for the example:
data <- tibble("ORDER"=c("ord1","ord2","ord3","ord3","ord3","ord4"),"PRODUCT"=c("p1","p2","p3","p2","p2","p3"),STORE=c("s1","s2","s3","s3","s3","s2"),"OPTIONAL"=c("Y","N","Y","N","Y","Y"),"PRICE"=c(3,2,3,5,6,1))
My goal is to create a dataset with:
I've managed to pivot the table with:
> data_wide <- data %>%
+ pivot_wider(names_from = PRODUCT, values_from = PRICE, values_fill = NA ,
+ values_fn = list(PRICE = sum))
Now I have this tibble:
ORDER STORE OPTIONAL p1 p2 p3
<chr> <chr> <chr> <dbl> <dbl> <dbl>
1 ord1 s1 Y 3 NA NA
2 ord2 s2 N NA 2 NA
3 ord3 s3 Y NA 6 3
4 ord3 s3 N NA 5 NA
5 ord4 s2 Y NA NA 1
I want to create a table like:
ORDER STORE OPTIONAL p1 p2 p3
<chr> <chr> <chr> <dbl> <dbl> <dbl>
1 ord1 s1 Y 3 NA NA
2 ord2 s2 N NA 2 NA
3 ord3 s3 Y NA 11 3
4 ord4 s2 Y NA NA NA
Where for each product I have the sum of price paid, and I know if the client have asked for an optional (I don't care for which product). I don't know how to face it:
Thanks!
Upvotes: 0
Views: 43
Reputation: 8880
solution option
library(tidyverse)
data <-
tibble(
"ORDER" = c("ord1", "ord2", "ord3", "ord3", "ord3", "ord4"),
"PRODUCT" = c("p1", "p2", "p3", "p2", "p2", "p3"),
STORE = c("s1", "s2", "s3", "s3", "s3", "s2"),
"OPTIONAL" = c("Y", "N", "Y", "N", "Y", "Y"),
"PRICE" = c(3, 2, 3, 5, 6, 1)
)
data %>%
pivot_wider(
names_from = PRODUCT,
values_from = c(PRICE),
values_fill = NA ,
values_fn = list(PRICE = sum)
) %>%
group_by(ORDER, STORE) %>%
summarise(
OPTIONAL = if_else(sum(OPTIONAL == "Y") > 0, "Y", "N"),
across(starts_with("p"), ~sum(.x)), .groups = "drop"
)
#> # A tibble: 4 x 6
#> ORDER STORE OPTIONAL p1 p2 p3
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 ord1 s1 Y 3 NA NA
#> 2 ord2 s2 N NA 2 NA
#> 3 ord3 s3 Y NA 11 NA
#> 4 ord4 s2 Y NA NA 1
Created on 2021-07-04 by the reprex package (v2.0.0)
Upvotes: 0
Reputation: 16998
You can use dplyr
and tidyr
:
library(dplyr)
library(tiydr)
data %>%
pivot_wider(names_from="PRODUCT", values_from="PRICE", values_fill=0) %>%
mutate(OPTIONAL = OPTIONAL == "Y") %>%
group_by(ORDER, STORE) %>%
summarise(across(c("OPTIONAL", matches("p\\d+")), ~ sum(.x)), .groups="drop") %>%
mutate(OPTIONAL = ifelse(OPTIONAL > 0, "Y", "N"),
across(matches("p\\d+"), ~na_if(.x,0)))
returns
# A tibble: 4 x 6
ORDER STORE OPTIONAL p1 p2 p3
<chr> <chr> <chr> <dbl> <dbl> <dbl>
1 ord1 s1 Y 3 NA NA
2 ord2 s2 N NA 2 NA
3 ord3 s3 Y NA 11 3
4 ord4 s2 Y NA NA 1
Upvotes: 1