Reputation: 101
I have a csv file of purchase history - item, date, price, vendor and so forth. I'd like to generate a 2-column table with the item in 1, and all of the vendors that supplied that item. Initial processing to remove the extraneous columns leaves me with rows of item and vendor pair for each transaction.
> vendorTable <- data.frame("Item" = sample(c("Item_A", "Item_B", "Item_B_raw", "Item C", "Item 4"), 40, replace =T),"Vendors" = sample(c("Vendor_1", "Vendor_2", "Vendor 3", "Vendor4", "Vendor 5."))
stringsAsFactors - FALSE)
> vendorTable
Item Vendors
1 Item_B Vendor 3
2 Item_B Vendor4
3 Item_B_raw Vendor_2
4 Item 4 Vendor 5.
5 Item_A Vendor_1
6 Item_A Vendor 3
7 Item_A Vendor4
8 Item C Vendor_2
9 Item_A Vendor 5.
10 Item_B_raw Vendor_1
11 Item_A Vendor 3
12 Item C Vendor4
13 Item_B Vendor_2
14 Item C Vendor 5.
15 Item 4 Vendor_1
16 Item C Vendor 3
17 Item_B_raw Vendor4
18 Item_B Vendor_2
19 Item_B_raw Vendor 5.
20 Item_B Vendor_1
Then that can be pared down with unique()
> vendorTable <- unique(vendorTable)
> vendorTable
Item Vendors
1 Item_B Vendor 3
2 Item_B Vendor4
3 Item_B_raw Vendor_2
4 Item 4 Vendor 5.
5 Item_A Vendor_1
6 Item_A Vendor 3
7 Item_A Vendor4
8 Item C Vendor_2
9 Item_A Vendor 5.
10 Item_B_raw Vendor_1
12 Item C Vendor4
13 Item_B Vendor_2
14 Item C Vendor 5.
15 Item 4 Vendor_1
16 Item C Vendor 3
17 Item_B_raw Vendor4
19 Item_B_raw Vendor 5.
20 Item_B Vendor_1
21 Item 4 Vendor 3
22 Item 4 Vendor4
23 Item 4 Vendor_2
26 Item_B_raw Vendor 3
30 Item C Vendor_1
34 Item_B Vendor 5.
That's where I get stuck. I know you can have a list in a data frame column,
>df <- data.frame(x = 1:3)
>df$y <- list(1:2, 1:3, 1:4)
>df
x y
1 1 1, 2
2 2 1, 2, 3
3 3 1, 2, 3, 4
however how to go from where I am, to where I want to be:
Item Vendor
Item_A Vendor_1, Vendor 3, Vendor4
Item_B Vendor_2, Vendor 3, Vendor4, Vendor 5.
etc.
is beyond my current skills, and searching has led down numerous dead ends. I suspect I'll be embarrassed by the simplicity of the answer, but such is life.
Upvotes: 1
Views: 58
Reputation: 887118
With tidyverse
, we can nest
library(tidyverse)
vendorTable %>%
nest(Vendors, .key = Vendors)
# Item Vendors
#1 Item_B Vendor 3, Vendor4, Vendor_2, Vendor_2, Vendor_1
#2 Item_B_raw Vendor_2, Vendor_1, Vendor4, Vendor 5.
#3 Item 4 Vendor 5., Vendor_1
#4 Item_A Vendor_1, Vendor 3, Vendor4, Vendor 5., Vendor 3
#5 Item C Vendor_2, Vendor4, Vendor 5., Vendor 3
Or use summarise
and list
vendorTable %>%
group_by(Item) %>%
summarise(Vendors = list(Vendors))
# A tibble: 5 x 2
# Item Vendors
# <chr> <list>
#1 Item 4 <chr [2]>
#2 Item C <chr [4]>
#3 Item_A <chr [5]>
#4 Item_B <chr [5]>
#5 Item_B_raw <chr [4]>
Upvotes: 1
Reputation: 79228
aggregate(.~Item,vendorTable,function(x)as.character(x))
Item Vendors
1 Item 4 Vendor_1, Vendor4, Vendor_1, Vendor4, Vendor 5., Vendor_1, Vendor 3
2 Item C Vendor_2, Vendor 3, Vendor_2, Vendor4, Vendor 5., Vendor_2, Vendor4, Vendor_2, Vendor 5., Vendor 5.
3 Item_A Vendor4, Vendor_1, Vendor 3, Vendor_1, Vendor 5., Vendor_2, Vendor_1, Vendor_1
4 Item_B Vendor 5., Vendor 3, Vendor 5., Vendor_1, Vendor4, Vendor_2
5 Item_B_raw Vendor_2, Vendor4, Vendor_2, Vendor 3, Vendor 3, Vendor 3, Vendor 5., Vendor 3, Vendor4
Upvotes: 2