cboettig
cboettig

Reputation: 12677

separate data into columns given by another column in tidyr

I am tidying data in which the desired column name mapping is given in a separate column, like so:

df <- data.frame(splitme = c("6, 7, 8, 9", "1,2,3"), 
                 type = c("A, B, C, D", "A, C, D"))

df looks like:

     splitme       type
 6, 7, 8, 9       A, B, C, D
      1,2,3       A, C, D

The desired output should look like:

desired_output <- data.frame(A = c(6,1), 
                             B = c(7, NA), 
                             C = c(8,2), 
                             D = c(9,3))

i.e.:

  A  B C D
  6  7 8 9
  1 NA 2 3

If it were not for the fact that some rows have missing types, this would be a straight-forward task for tidyr::separate.

## Not correctly aligned
df %>% 
tidyr::separate(splitme, into = c("A", "B", "C", "D")) %>% 
select(-type)

but clearly the alignment poses issues. If only the into argument could take a column specifying the split rule. Perhaps there is a purr::pmap_df based strategy that could be used here?

Upvotes: 5

Views: 58

Answers (2)

moodymudskipper
moodymudskipper

Reputation: 47320

Using purrr:map2_dfr, instead of parsing the splitme column we use the string directly in a data.frame call. We name the columns and map2_dfr bind the rows and deals with the mising values.

library(purrr)
map2_dfr(df$splitme,df$type,
         ~setNames(eval(parse(text=paste0("data.frame(",.x,")"))),
                   strsplit(.y,", ")[[1]]))
#   A  B C D
# 1 6  7 8 9
# 2 1 NA 2 3

Upvotes: 1

akuiper
akuiper

Reputation: 214957

You can use separate_rows followed by a reshape with spread:

library(dplyr); library(tidyr);
df %>% 
    # add a row identification number for reshaping purpose
    mutate(rn = row_number()) %>% 
    separate_rows(splitme, type) %>% 
    spread(type, splitme) %>% 
    select(-rn)

#  A    B C D
#1 6    7 8 9
#2 1 <NA> 2 3

Upvotes: 5

Related Questions