Reputation: 1
I have data which looks like this:-
Customer.Name Posting.Date SKU Sales.Amount($)
A 20150124 SKU1 12345
B 20150125 SKU2 22345
A 20150124 SKU3 52345
A 20150125 SKU4 55345
C 20150126 SKU6 62345
B 20150127 SKU7 72345
C 20150126 SKU8 72945
I'm trying to convert it into the following structures without column names:- Interim Data Structure:-
A 20150124 SKU1 SKU 3
A 20150125 SKU4
B 20150125 SKU2
C 20150126 SKU6 SKU8
B 20150127 SKU7
The interim output is sorted by posting date and each SKU purchased as part of a customer level transaction (identified by customer name + posting date) needs to get converted to a column.
Final data structure (below) needs to just contains SKUs purchased in transactions with the customer name and posting date columns removed from the interim output above.
SKU1 SKU3
SKU4
SKU2
SKU6 SKU8
SKU7
I have tried the following code to get the final structure but am getting an error.
library(dplyr)
library(tidyr)
Dataset2 <- Data %>%
group_by(Customer.Name, Posting.Date) %>%
summarise(SKU=toString(unique(SKU))) %>%
select(Customer.Name, Posting.Date,SKU) %>%
spread(Customer.Name,SKU,drop=FALSE)
The error I get is
Error: Each variable must be named. Problem variables: 2
Could you please guide on how to solve this problem?
Upvotes: 0
Views: 323
Reputation: 886938
It is better to have column names without special characters. If we remove it
colnames(Data)[4] <- "Sales.Amount"
The OP's code works fine
Dataset2
# A tibble: 4 × 4
# Posting.Date A B C
#* <int> <chr> <chr> <chr>
#1 20150124 SKU1, SKU3 <NA> <NA>
#2 20150125 SKU4 SKU2 <NA>
#3 20150126 <NA> <NA> SKU6, SKU8
#4 20150127 <NA> SKU7 <NA>
Or even with it, it works fine on tidyr_0.6.0
and dplyr_0.5.0
Upvotes: 0
Reputation: 160407
Here's one attempt. It has column names, though they can be discarded later as needed.
df <- data.frame(
V1=c( "A", "B", "A", "A", "C", "B", "C" ),
V2=c(20150124, 20150125, 20150124, 20150125, 20150126, 20150127, 20150126 ),
V3=c( "SKU1", "SKU2", "SKU3", "SKU4", "SKU6", "SKU7", "SKU8" ),
V4=c( 12345, 22345, 52345, 55345, 62345, 72345, 72945 ),
stringsAsFactors = FALSE
)
I'm inferring from your question that the fourth column is unnecessary, so I remove it.
library(dplyr)
library(tidyr)
out1 <- df %>%
select(-V4) %>%
group_by(V1, V2) %>%
mutate(row = row_number()) %>%
spread(row, V3) %>%
ungroup()
out1
# # A tibble: 5 × 4
# V1 V2 `1` `2`
# * <chr> <dbl> <chr> <chr>
# 1 A 20150124 SKU1 SKU3
# 2 A 20150125 SKU4 <NA>
# 3 B 20150125 SKU2 <NA>
# 4 B 20150127 SKU7 <NA>
# 5 C 20150126 SKU6 SKU8
out1 %>% select(-V1, -V2)
# # A tibble: 5 × 2
# `1` `2`
# * <chr> <chr>
# 1 SKU1 SKU3
# 2 SKU4 <NA>
# 3 SKU2 <NA>
# 4 SKU7 <NA>
# 5 SKU6 SKU8
Upvotes: 1