abhiieor
abhiieor

Reputation: 3554

parsing list string vector to multiple column data.table

I am trying to convert data of form:

dt <- data.table(foo = c(c('a=a1|b=b1'),c('a=a2|b=b2|c=c2'),c('a=a3|d=d3')))

to form:

data.table(a=c('a1','a2','a3'),b=c('b1','b2',NA),c=c(NA,'c2',NA),d=c(NA,NA,'d3'))

I tried to parse first step using:

lapply(dt$foo, function(x) unlist(strsplit(x, split = '|', fixed = T)))

but couldn't proceed further. Any pointers?

Upvotes: 2

Views: 369

Answers (3)

MKR
MKR

Reputation: 20085

An option is to use read.table function to read values as key-value pair and then finally convert into data.frame. The dplyr::bind_rows can help to join different rows.

dt <- data.table(foo = c(c('a=a1|b=b1'),c('a=a2|b=b2|c=c2'),c('a=a3|d=d3')))

library(dplyr)
bind_rows(mapply(function(x){
  t <- read.table(text = gsub("\\|","\n",x), sep=c("="), stringsAsFactors=FALSE)
  t <- as.data.frame(t(t), stringsAsFactors = FALSE)
  colnames(t) <- t[1,]
  t <- t[-1,]
  }, dt$foo))
#    a    b    c    d
# 1 a1   b1 <NA> <NA>
# 2 a2   b2   c2 <NA>
# 3 a3 <NA> <NA>   d3

UPDATED: data.table based solution as suggested by @abhiieor will be as:

library(data.table)
rbindlist(mapply(function(x){
  t <- read.table(text = gsub("\\|","\n",x), sep=c("="), stringsAsFactors=FALSE)
  t <- as.data.frame(t(t), stringsAsFactors = FALSE)
  colnames(t) <- t[1,]
  t <- t[-1,]
  }, dt$foo), use.names = T, fill = T)

Upvotes: 1

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193507

Converting my comment to an answer, you can try:

library(splitstackshape)
cSplit(dt[, row := .I], "foo", "|", "long")[
  , cSplit(.SD, "foo", "=")][
    , dcast(.SD, row ~ foo_1, value.var = "foo_2")]
#    row  a    b    c    d
# 1:   1 a1   b1 <NA> <NA>
# 2:   2 a2   b2   c2 <NA>
# 3:   3 a3 <NA> <NA>   d3

You can, of course, also just use a combination of strsplit, tstrsplit, and dcast as well.

dt[, unlist(strsplit(foo, "|", TRUE)), 1:nrow(dt)][
  , c("col", "val") := tstrsplit(V1, "=", fixed = TRUE)][
    , dcast(.SD, nrow ~ col, value.var = "val")]
#    nrow  a    b    c    d
# 1:    1 a1   b1 <NA> <NA>
# 2:    2 a2   b2   c2 <NA>
# 3:    3 a3 <NA> <NA>   d3

Upvotes: 3

Calum You
Calum You

Reputation: 15062

Will update if more cases are provided. This is not a data.table because I don't use it, but afaik it should still work? Maybe if coerced to data.frame first.

library(tidyverse)
dt <- tibble(foo = c(c('a=a1|b=b1'),c('a=a2|b=b2|c=c2'),c('a=a3|d=d3')))
tibble(a=c('a1','a2','a3'),b=c('b1','b2',NA),c=c(NA,'c2',NA),d=c(NA,NA,'d3'))
#> # A tibble: 3 x 4
#>   a     b     c     d    
#>   <chr> <chr> <chr> <chr>
#> 1 a1    b1    <NA>  <NA> 
#> 2 a2    b2    c2    <NA> 
#> 3 a3    <NA>  <NA>  d3

dt %>%
  mutate(foo = str_split(foo, pattern = "\\|")) %>%
  rowid_to_column() %>% 
  unnest() %>%
  separate(foo, into = c("col", "val"), sep = "=") %>%
  spread(col, val)
#> # A tibble: 3 x 5
#>   rowid a     b     c     d    
#>   <int> <chr> <chr> <chr> <chr>
#> 1     1 a1    b1    <NA>  <NA> 
#> 2     2 a2    b2    c2    <NA> 
#> 3     3 a3    <NA>  <NA>  d3

Created on 2018-04-12 by the reprex package (v0.2.0).

Upvotes: 4

Related Questions