Reputation: 2500
I have a long list of objects that I need to divide into smaller lists, each with 20 entries. The catch is that each object can only appear once in a single list.
# Create some example data...
# Make a list of objects.
LIST <- c('Oranges', 'Toast', 'Truck', 'Dog', 'Hippo', 'Bottle', 'Hope', 'Mint', 'Red', 'Trees', 'Watch', 'Cup', 'Pencil', 'Lunch', 'Paper', 'Peanuts', 'Cloud', 'Forever', 'Ocean', 'Train', 'Fork', 'Moon', 'Horse', 'Parrot', 'Leaves', 'Book', 'Cheese', 'Tin', 'Bag', 'Socks', 'Lemons', 'Blue', 'Plane', 'Hammock', 'Roof', 'Wind', 'Green', 'Chocolate', 'Car', 'Distance')
# Generate a longer list, with a random sequence and number of repetitions for each entry
LONG.LIST <- data.frame(Name = (sample(LIST, size = 200, replace = TRUE)))
print(LONG.LIST)
Name
1 Cup
2 Distance
3 Roof
4 Pencil
5 Lunch
6 Toast
7 Watch
8 Bottle
9 Car
10 Roof
11 Lunch
12 Forever
13 Cheese
14 Oranges
15 Ocean
16 Chocolate
17 Socks
18 Leaves
19 Oranges
20 Distance
21 Green
22 Paper
23 Red
24 Paper
25 Trees
26 Chocolate
27 Bottle
28 Dog
29 Wind
30 Parrot
etc....
Using the example generated above, 'Distance'
appears at both position '2' and position '20', 'Lunch'
at both '5' and '11, and 'Oranges'
at '14' and 19', so the first list without duplicates would need to extend to include 'Green'
, 'Paper'
and 'Red'
. The second list would then begin with 'Paper'
at position 24.
The last list is likely to be incomplete, so it would be good to pad it with 'NA's
It would be simplest if the output were columns in a single data frame.
I've no idea where to even start with this, so any suggestions are really appreciated. Thanks!
Upvotes: 1
Views: 566
Reputation: 887078
We can do this with tidyverse
. Grouped by 'Name', create a column with sequence numbers, that we use in group_by
to create a new sequence column 'ind', then convert to 'wide' format with spread
and order
the columns alphabetically
library(tidyverse)
LONG.LIST %>%
group_by(Name) %>%
mutate(grp = row_number()) %>%
group_by(grp) %>%
mutate(ind = row_number()) %>%
spread(grp, Name) %>%
mutate_at(vars(-one_of("ind")), funs(.[order(as.character(.))]))
# A tibble: 40 x 12
# ind `1` `2` `3` `4` `5` `6` `7` `8` `9` `10` `11`
# <int> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr>
# 1 1 Bag Bag Bag Bag Bag Bag Bag Bag Cup Distance Distance
# 2 2 Blue Blue Book Book Book Cloud Cup Cup Distance Train NA
# 3 3 Book Book Bottle Cloud Cloud Cup Distance Distance Train NA NA
# 4 4 Bottle Bottle Cheese Cup Cup Distance Dog Hammock NA NA NA
# 5 5 Car Car Cloud Distance Distance Dog Hammock Moon NA NA NA
# 6 6 Cheese Cheese Cup Dog Dog Hammock Moon Parrot NA NA NA
# 7 7 Chocolate Chocolate Distance Fork Hammock Horse Paper Train NA NA NA
# 8 8 Cloud Cloud Dog Hammock Horse Moon Parrot NA NA NA NA
# 9 9 Cup Cup Fork Hippo Mint Paper Train NA NA NA NA
#10 10 Distance Distance Green Horse Moon Parrot NA NA NA NA NA
# ... with 30 more rows
Upvotes: 3