January
January

Reputation: 17090

Turning a data frame and a list into long format with dplyr

Here is a puzzle.

Assume you have a data frame and a list. The list has as many elements as the df has rows:

dd <- data.frame(ID=1:3, Name=LETTERS[1:3])
dl <- map(4:6, rnorm) %>% set_names(letters[1:3])

Is there a simple way (preferably with dplyr / tidyverse) to make a long format, such that the elements of the list are joined with the corresponding rows of the data frame? Here is what I have in mind illustrated with not-so-elegant way:

rows <- map(1:length(dl), ~ rep(., length(dl[[.]]))) %>% unlist()
dd <- dd[rows,]
dd$value <- unlist(dl)

As you can see, for each vector in dl, we replicated the corresponding row as many times as necessary to accommodate each value.

Upvotes: 6

Views: 1059

Answers (3)

akrun
akrun

Reputation: 887108

We can create a list column and unnest

library(tidyverse)
dd %>% 
  mutate(value = dl) %>% 
  unnest
#   ID Name       value
#1   1    A  1.57984385
#2   1    A  0.66831102
#3   1    A -0.45472145
#4   1    A  2.33807619
#5   2    B  1.56716709
#6   2    B  0.74982763
#7   2    B  0.07025534
#8   2    B  1.31174561
#9   2    B  0.57901536
#10  3    C -1.36629653
#11  3    C -0.66437155
#12  3    C  2.12506187
#13  3    C  1.20220402
#14  3    C  0.10687018
#15  3    C  0.15973401

Note that if the criteria is based on the compactness of code, if we remove the %>%

unnest(mutate(dd, value = dl))

Or another option is uncount and mutate

dd %>% 
   uncount(lengths(dl)) %>%
   mutate(value = flatten_dbl(unname(dl)))

If it needs a join based on the names of the 'dl'

enframe(dl, name = 'Name') %>%
     mutate(Name = toupper(Name)) %>% 
     left_join(dd) %>% 
     unnest

In base R, we can replicate the rows of 'dd' with lengths of 'dl' and transform to create the 'value' as unlisted 'dl'

transform(dd[rep(seq_len(nrow(dd)), lengths(dl)),], value = unlist(dl))

Upvotes: 5

IceCreamToucan
IceCreamToucan

Reputation: 28685

Since a dplyr solution has already been provided, another option is to subset dl for each Name value in dd using data.table grouping

library(data.table)
setDT(dd)

dd[, .(values = dl[[tolower(Name)]]), by = .(ID, Name)]

#     ID Name      values
#  1:  1    A -1.09633600
#  2:  1    A -1.26238190
#  3:  1    A  1.15220845
#  4:  1    A -1.45741071
#  5:  2    B -0.49318131
#  6:  2    B  0.59912670
#  7:  2    B -0.73117632
#  8:  2    B -1.09646143
#  9:  2    B -0.79409753
# 10:  3    C -0.08205888
# 11:  3    C  0.21503398
# 12:  3    C -1.17541571
# 13:  3    C -0.10020616
# 14:  3    C -1.01152362
# 15:  3    C -1.03693337

Upvotes: 7

Cath
Cath

Reputation: 24074

In base R, you can get your result with stack followed by merge:

res <- merge(stack(dl), dd, by.x="ind", by.y="Name")

head(res)
#  ind      values ID
#1   A -0.79616693  1
#2   A  0.37720953  1
#3   A  1.30273712  1
#4   A  0.19483859  1
#5   B  0.18770716  2
#6   B -0.02226917  2

NB: I supposed the names for dl were supposed to be in uppercases but if they are indeed lowercase, the following line needs to be pass instead:

res <- merge(stack(setNames(dl, toupper(names(dl)))), dd, by.x="ind", by.y="Name")

Upvotes: 10

Related Questions