ThingyBlahBlah3
ThingyBlahBlah3

Reputation: 51

R - rotating a difficult data frame

Suppose I've got sales data for three sales reps selling a variety of products. What makes it difficult is that each rep sells a different combination of products, and not necessarily the same number of them, either:

Bob sells Products A, B, and C

Mike sells Products A, B, C, and D

Sara sells Products A, B, and E

   RepName Product SalesDollarAmt SalesQty
 1     Bob       A             43        3
 2    Mike       A             14        5
 3    Sara       A             53        1
 4     Bob       B            100       35
 5    Mike       B            215       80
 6    Sara       B            310      105
 7     Bob       C              5        8
 8    Mike       C             10        3
 9    Mike       D            105       50
10    Sara       E             25       18

I want to rotate this on the Product, so that the result looks like this:

  RepName Product.1 SalesDollarAmt.1 SalesQty.1 Product.2 SalesDollarAmt.2 SalesQty.2 Product.3 SalesDollarAmt.3 SalesQty.3 Product.4 SalesDollarAmt.4 SalesQty.4
1     Bob         A               43          3         B              100         35         C                5          8      <NA>                0          0
2    Mike         A               14          5         B              215         80         C               10          3         D              105         50
3    Sara         A               53          1         B              310        105         E               25         18      <NA>                0          0

If they all had the same products, I'd have filtered them by Product into separate dataframes and then joined them back together on RepName. I've tried everything I can think of with spread and dcast. Thanks for any help!

Code for sample data frames:

library(tidyverse)

# initial sales data
df <- tribble(
  ~RepName, ~Product, ~SalesDollarAmt, ~SalesQty,
               #-------------------------------
               "Bob", "A", 43, 3,
               "Mike", "A", 14, 5,
               "Sara", "A", 53, 1,
               "Bob", "B", 100, 35,
               "Mike", "B", 215, 80,
               "Sara", "B", 310, 105,
               "Bob", "C", 5, 8,
               "Mike", "C", 10, 3,
               "Mike", "D", 105, 50,
               "Sara", "E", 25, 18
                )

# ideally rotated data
df2 <- tribble(
  ~RepName, ~Product.1, ~SalesDollarAmt.1, ~SalesQty.1, ~Product.2, ~SalesDollarAmt.2, ~SalesQty.2, ~Product.3, ~SalesDollarAmt.3, ~SalesQty.3, ~Product.4, ~SalesDollarAmt.4, ~SalesQty.4,
  #--------------------------------------------------------------
  "Bob", "A", 43, 3, "B", 100, 35, "C", 5, 8, NA, 0, 0, 
  "Mike", "A", 14, 5, "B", 215, 80, "C", 10, 3, "D", 105, 50,
  "Sara", "A", 53, 1, "B", 310, 105, "E", 25, 18, NA, 0, 0 
)

Upvotes: 1

Views: 73

Answers (2)

Uwe
Uwe

Reputation: 42564

The question is tagged with dcast, so I feel obliged to post a solution which uses dcast().

The data.table version of dcast() can reshape multiple value columns simultaneously which is exactly what we need here. In addition, the rowid() function is used to fill the columns individually for each RepName:

library(data.table)
cast(setDT(df), RepName ~ rowid(RepName), value.var = c("Product", "SalesDollarAmt", "SalesQty"))
   RepName Product_1 Product_2 Product_3 Product_4 SalesDollarAmt_1 SalesDollarAmt_2 SalesDollarAmt_3 SalesDollarAmt_4 SalesQty_1 SalesQty_2 SalesQty_3 SalesQty_4
1:     Bob         A         B         C        NA               43              100                5               NA          3         35          8         NA
2:    Mike         A         B         C         D               14              215               10              105          5         80          3         50
3:    Sara         A         B         E        NA               53              310               25               NA          1        105         18         NA

Edit: Improved version with columns in requested order

In a comment, the OP has disclosed that the reshaping is required because the data will be further processed by an Excel macro. Usually, the position of columns is crucial for Excel formulae.

Therefore the variant below is reordering the columns so that all columns which belong to one product are grouped together:

library(data.table)
# value columns
val <- c("Product", "SalesDollarAmt", "SalesQty")
# create vector of column names in the expected order
col_order <- setDT(df)[, .N, by = RepName][, CJ(seq_len(max(N)), val)][, paste(V2, V1, sep = "_")]
dcast(df, RepName ~ rowid(RepName), value.var = val)[
  #re-order columns in place, i.e., without copying
  , setcolorder(.SD, c("RepName", col_order))]
   RepName Product_1 SalesDollarAmt_1 SalesQty_1 Product_2 SalesDollarAmt_2 SalesQty_2 Product_3 SalesDollarAmt_3 SalesQty_3 Product_4 SalesDollarAmt_4 SalesQty_4
1:     Bob         A               43          3         B              100         35         C                5          8        NA               NA         NA
2:    Mike         A               14          5         B              215         80         C               10          3         D              105         50
3:    Sara         A               53          1         B              310        105         E               25         18        NA               NA         NA

Upvotes: 1

bouncyball
bouncyball

Reputation: 10771

Using a combination of row_number, gather, spread, and unite, we can reshape the data. It's up to you to reorder the columns if you so choose. In the last line, we specify convert = TRUE in the call to spread. This is due to the fact that when we convert the data to long format (using gather), the column values are converted to character. Specifying convert = TRUE in the call to spread (should) get the values back to a useful form.

df %>%
  group_by(RepName) %>%
  mutate(product_count = row_number()) %>% # product "id" within RepName
  gather(variable, value, -RepName, -product_count) %>% # reshape to long
  unite(var_prod, variable, product_count) %>%
  spread(var_prod, value, convert = TRUE) # reshape to wide

  RepName Product_1 Product_2 Product_3 Product_4 SalesDollarAmt_1 SalesDollarAmt_2 SalesDollarAmt_3 SalesDollarAmt_4 SalesQty_1 SalesQty_2 SalesQty_3 SalesQty_4
1     Bob         A         B         C      <NA>               43              100                5             <NA>          3         35          8       <NA>
2    Mike         A         B         C         D               14              215               10              105          5         80          3         50
3    Sara         A         B         E      <NA>               53              310               25             <NA>          1        105         18       <NA>

Upvotes: 1

Related Questions