Andrew M
Andrew M

Reputation: 101

R generate list in data.frame from unique pairs

I have a csv file of purchase history - item, date, price, vendor and so forth. I'd like to generate a 2-column table with the item in 1, and all of the vendors that supplied that item. Initial processing to remove the extraneous columns leaves me with rows of item and vendor pair for each transaction.

    > vendorTable <- data.frame("Item" = sample(c("Item_A", "Item_B", "Item_B_raw", "Item C", "Item 4"), 40, replace =T),"Vendors" = sample(c("Vendor_1", "Vendor_2", "Vendor 3", "Vendor4", "Vendor 5."))
stringsAsFactors - FALSE)
> vendorTable
         Item   Vendors
1      Item_B  Vendor 3
2      Item_B   Vendor4
3  Item_B_raw  Vendor_2
4      Item 4 Vendor 5.
5      Item_A  Vendor_1
6      Item_A  Vendor 3
7      Item_A   Vendor4
8      Item C  Vendor_2
9      Item_A Vendor 5.
10 Item_B_raw  Vendor_1
11     Item_A  Vendor 3
12     Item C   Vendor4
13     Item_B  Vendor_2
14     Item C Vendor 5.
15     Item 4  Vendor_1
16     Item C  Vendor 3
17 Item_B_raw   Vendor4
18     Item_B  Vendor_2
19 Item_B_raw Vendor 5.
20     Item_B  Vendor_1

Then that can be pared down with unique()

> vendorTable <- unique(vendorTable)
> vendorTable
         Item   Vendors
1      Item_B  Vendor 3
2      Item_B   Vendor4
3  Item_B_raw  Vendor_2
4      Item 4 Vendor 5.
5      Item_A  Vendor_1
6      Item_A  Vendor 3
7      Item_A   Vendor4
8      Item C  Vendor_2
9      Item_A Vendor 5.
10 Item_B_raw  Vendor_1
12     Item C   Vendor4
13     Item_B  Vendor_2
14     Item C Vendor 5.
15     Item 4  Vendor_1
16     Item C  Vendor 3
17 Item_B_raw   Vendor4
19 Item_B_raw Vendor 5.
20     Item_B  Vendor_1
21     Item 4  Vendor 3
22     Item 4   Vendor4
23     Item 4  Vendor_2
26 Item_B_raw  Vendor 3
30     Item C  Vendor_1
34     Item_B Vendor 5.

That's where I get stuck. I know you can have a list in a data frame column,

>df <- data.frame(x = 1:3)
>df$y <- list(1:2, 1:3, 1:4)
>df

  x          y
1 1       1, 2
2 2    1, 2, 3
3 3 1, 2, 3, 4

however how to go from where I am, to where I want to be:

Item        Vendor
Item_A      Vendor_1, Vendor 3, Vendor4
Item_B      Vendor_2, Vendor 3, Vendor4, Vendor 5.
etc.

is beyond my current skills, and searching has led down numerous dead ends. I suspect I'll be embarrassed by the simplicity of the answer, but such is life.

Upvotes: 1

Views: 58

Answers (2)

akrun
akrun

Reputation: 887118

With tidyverse, we can nest

library(tidyverse)
vendorTable %>%
       nest(Vendors, .key = Vendors)  
#        Item                                          Vendors
#1     Item_B  Vendor 3, Vendor4, Vendor_2, Vendor_2, Vendor_1
#2 Item_B_raw           Vendor_2, Vendor_1, Vendor4, Vendor 5.
#3     Item 4                              Vendor 5., Vendor_1
#4     Item_A Vendor_1, Vendor 3, Vendor4, Vendor 5., Vendor 3
#5     Item C           Vendor_2, Vendor4, Vendor 5., Vendor 3

Or use summarise and list

vendorTable %>% 
     group_by(Item) %>% 
     summarise(Vendors = list(Vendors))
# A tibble: 5 x 2
#   Item       Vendors  
#   <chr>      <list>   
#1 Item 4     <chr [2]>
#2 Item C     <chr [4]>
#3 Item_A     <chr [5]>
#4 Item_B     <chr [5]>
#5 Item_B_raw <chr [4]>

Upvotes: 1

Onyambu
Onyambu

Reputation: 79228

aggregate(.~Item,vendorTable,function(x)as.character(x))
        Item                                                                                             Vendors
1     Item 4                                 Vendor_1, Vendor4, Vendor_1, Vendor4, Vendor 5., Vendor_1, Vendor 3
2     Item C Vendor_2, Vendor 3, Vendor_2, Vendor4, Vendor 5., Vendor_2, Vendor4, Vendor_2, Vendor 5., Vendor 5.
3     Item_A                      Vendor4, Vendor_1, Vendor 3, Vendor_1, Vendor 5., Vendor_2, Vendor_1, Vendor_1
4     Item_B                                         Vendor 5., Vendor 3, Vendor 5., Vendor_1, Vendor4, Vendor_2
5 Item_B_raw             Vendor_2, Vendor4, Vendor_2, Vendor 3, Vendor 3, Vendor 3, Vendor 5., Vendor 3, Vendor4

Upvotes: 2

Related Questions