EcologyTom
EcologyTom

Reputation: 2500

Divide long list into shorter lists in R

I have a long list of objects that I need to divide into smaller lists, each with 20 entries. The catch is that each object can only appear once in a single list.

# Create some example data... 
# Make a list of objects.
LIST <- c('Oranges', 'Toast', 'Truck', 'Dog', 'Hippo', 'Bottle', 'Hope', 'Mint', 'Red', 'Trees', 'Watch', 'Cup', 'Pencil', 'Lunch', 'Paper', 'Peanuts', 'Cloud', 'Forever', 'Ocean', 'Train', 'Fork', 'Moon', 'Horse', 'Parrot', 'Leaves', 'Book', 'Cheese', 'Tin', 'Bag', 'Socks', 'Lemons', 'Blue', 'Plane', 'Hammock', 'Roof', 'Wind', 'Green', 'Chocolate', 'Car', 'Distance')

# Generate a longer list, with a random sequence and number of repetitions for each entry
LONG.LIST <- data.frame(Name = (sample(LIST, size = 200, replace = TRUE)))

print(LONG.LIST)

Name
1         Cup
2    Distance
3        Roof
4      Pencil
5       Lunch
6       Toast
7       Watch
8      Bottle
9         Car
10       Roof
11      Lunch
12    Forever
13     Cheese
14    Oranges
15      Ocean
16  Chocolate
17      Socks
18     Leaves
19    Oranges
20   Distance
21      Green
22      Paper
23        Red
24      Paper
25      Trees
26  Chocolate
27     Bottle
28        Dog
29       Wind
30     Parrot
etc....

Using the example generated above, 'Distance' appears at both position '2' and position '20', 'Lunch' at both '5' and '11, and 'Oranges' at '14' and 19', so the first list without duplicates would need to extend to include 'Green', 'Paper' and 'Red'. The second list would then begin with 'Paper' at position 24.

The last list is likely to be incomplete, so it would be good to pad it with 'NA's

It would be simplest if the output were columns in a single data frame.

I've no idea where to even start with this, so any suggestions are really appreciated. Thanks!

Upvotes: 1

Views: 566

Answers (1)

akrun
akrun

Reputation: 887078

We can do this with tidyverse. Grouped by 'Name', create a column with sequence numbers, that we use in group_by to create a new sequence column 'ind', then convert to 'wide' format with spread and order the columns alphabetically

library(tidyverse)
LONG.LIST %>%
   group_by(Name) %>%
   mutate(grp = row_number()) %>%
   group_by(grp) %>% 
   mutate(ind = row_number()) %>% 
   spread(grp, Name) %>%
   mutate_at(vars(-one_of("ind")), funs(.[order(as.character(.))]))
# A tibble: 40 x 12
#     ind       `1`       `2`      `3`      `4`      `5`      `6`      `7`      `8`      `9`     `10`     `11`
#   <int>    <fctr>    <fctr>   <fctr>   <fctr>   <fctr>   <fctr>   <fctr>   <fctr>   <fctr>   <fctr>   <fctr>
# 1     1       Bag       Bag      Bag      Bag      Bag      Bag      Bag      Bag      Cup Distance Distance
# 2     2      Blue      Blue     Book     Book     Book    Cloud      Cup      Cup Distance    Train       NA
# 3     3      Book      Book   Bottle    Cloud    Cloud      Cup Distance Distance    Train       NA       NA
# 4     4    Bottle    Bottle   Cheese      Cup      Cup Distance      Dog  Hammock       NA       NA       NA
# 5     5       Car       Car    Cloud Distance Distance      Dog  Hammock     Moon       NA       NA       NA
# 6     6    Cheese    Cheese      Cup      Dog      Dog  Hammock     Moon   Parrot       NA       NA       NA
# 7     7 Chocolate Chocolate Distance     Fork  Hammock    Horse    Paper    Train       NA       NA       NA
# 8     8     Cloud     Cloud      Dog  Hammock    Horse     Moon   Parrot       NA       NA       NA       NA
# 9     9       Cup       Cup     Fork    Hippo     Mint    Paper    Train       NA       NA       NA       NA
#10    10  Distance  Distance    Green    Horse     Moon   Parrot       NA       NA       NA       NA       NA
# ... with 30 more rows

Upvotes: 3

Related Questions