Reputation: 1

Summarize Rows with

I'm trying to reduce and simplify a dataset from our store order tool. Our tools export each order row as a new order with repeated values. For example, current export is like:

  ORDER PRODUCT STORE OPTIONAL PRICE
  <chr> <chr>   <chr> <chr>    <dbl>
1 ord1  p1      s1    Y            3
2 ord2  p2      s2    N            2
3 ord3  p3      s3    Y            3
4 ord3  p2      s3    N            5
5 ord3  p2      s3    Y            6
6 ord4  p3      s2    Y            1

Here the code for the example:

data <- tibble("ORDER"=c("ord1","ord2","ord3","ord3","ord3","ord4"),"PRODUCT"=c("p1","p2","p3","p2","p2","p3"),STORE=c("s1","s2","s3","s3","s3","s2"),"OPTIONAL"=c("Y","N","Y","N","Y","Y"),"PRICE"=c(3,2,3,5,6,1))

My goal is to create a dataset with:

the totale paid in each oder for a specific product (we have 220 different products)
if cliend asked or not for an optional

I've managed to pivot the table with:

    > data_wide <- data %>% 
+     pivot_wider(names_from = PRODUCT, values_from = PRICE, values_fill = NA ,
+                 values_fn = list(PRICE = sum))

Now I have this tibble:

  ORDER STORE OPTIONAL    p1    p2    p3
  <chr> <chr> <chr>    <dbl> <dbl> <dbl>
1 ord1  s1    Y            3    NA    NA
2 ord2  s2    N           NA     2    NA
3 ord3  s3    Y           NA     6     3
4 ord3  s3    N           NA     5    NA
5 ord4  s2    Y           NA    NA     1

I want to create a table like:

  ORDER STORE OPTIONAL    p1    p2    p3
  <chr> <chr> <chr>    <dbl> <dbl> <dbl>
1 ord1  s1    Y            3    NA    NA
2 ord2  s2    N           NA     2    NA
3 ord3  s3    Y           NA    11     3
4 ord4  s2    Y           NA    NA    NA

Where for each product I have the sum of price paid, and I know if the client have asked for an optional (I don't care for which product). I don't know how to face it:

I can't lost information about sigle purchase (for example store)
I need an unique ORDER id, because I'll join with the client information dataset

Thanks!

Upvotes: 0

Answers (2)

Yuriy Saraykin

Reputation: 8880

solution option

library(tidyverse)
data <-
  tibble(
    "ORDER" = c("ord1", "ord2", "ord3", "ord3", "ord3", "ord4"),
    "PRODUCT" = c("p1", "p2", "p3", "p2", "p2", "p3"),
    STORE = c("s1", "s2", "s3", "s3", "s3", "s2"),
    "OPTIONAL" = c("Y", "N", "Y", "N", "Y", "Y"),
    "PRICE" = c(3, 2, 3, 5, 6, 1)
  )

data %>%
  pivot_wider(
    names_from = PRODUCT,
    values_from = c(PRICE),
    values_fill = NA ,
    values_fn = list(PRICE = sum)
  ) %>% 
  group_by(ORDER, STORE) %>% 
  summarise(
    OPTIONAL = if_else(sum(OPTIONAL == "Y") > 0, "Y", "N"),
    across(starts_with("p"), ~sum(.x)), .groups = "drop"
    )
#> # A tibble: 4 x 6
#>   ORDER STORE OPTIONAL    p1    p2    p3
#>   <chr> <chr> <chr>    <dbl> <dbl> <dbl>
#> 1 ord1  s1    Y            3    NA    NA
#> 2 ord2  s2    N           NA     2    NA
#> 3 ord3  s3    Y           NA    11    NA
#> 4 ord4  s2    Y           NA    NA     1

^{Created on 2021-07-04 by the reprex package (v2.0.0)}

Upvotes: 0

Martin Gal

Reputation: 16998

You can use dplyr and tidyr:

library(dplyr)
library(tiydr)

data %>% 
  pivot_wider(names_from="PRODUCT", values_from="PRICE", values_fill=0) %>% 
  mutate(OPTIONAL = OPTIONAL == "Y") %>% 
  group_by(ORDER, STORE) %>% 
  summarise(across(c("OPTIONAL", matches("p\\d+")), ~ sum(.x)), .groups="drop") %>% 
  mutate(OPTIONAL = ifelse(OPTIONAL > 0, "Y", "N"),
         across(matches("p\\d+"), ~na_if(.x,0)))

returns

# A tibble: 4 x 6
  ORDER STORE OPTIONAL    p1    p2    p3
  <chr> <chr> <chr>    <dbl> <dbl> <dbl>
1 ord1  s1    Y            3    NA    NA
2 ord2  s2    N           NA     2    NA
3 ord3  s3    Y           NA    11     3
4 ord4  s2    Y           NA    NA     1

Upvotes: 1

Summarize Rows with

Answers (2)

Related Questions