Reputation: 411

melt data table and split values

I have a column in a data table which is a list of comma separated values

dt = data.table( a = c('a','b','c'), b = c('xx,yy,zz','mm,nn','qq,rr,ss,tt'))
> dt
   a           b
1: a    xx,yy,zz
2: b       mm,nn
3: c qq,rr,ss,tt

I would like to transform it into a long format

   a  b
1: a xx
2: a yy
3: a zz
4: b mm
5: b nn
6: c qq
7: c rr
8: c ss
9: c tt

This question has been answered for a data frame here. I'm wondering if there is an elegant data table solution.

Upvotes: 0

Answers (3)

lmo

Reputation: 38510

The following will work for your example:

dt[, c(b=strsplit(b, ",")), by=a]
   a b
1: a xx
2: a yy
3: a zz
4: b mm
5: b nn
6: c qq
7: c rr
8: c ss
9: c tt

This method fails if the "by" variable is repeated as in

dt = data.table(a = c('a','b','c', 'a'),
                b = c('xx,yy,zz','mm,nn','qq,rr,ss,tt', 'zz,gg,tt'))

One robust solution in this situation can be had by using paste to collapse all observations with the same grouping variable (a) and feeding the result to the code above.

dt[, .(b=paste(b, collapse=",")), by=a][, c(b=strsplit(b, ",")), by=a]

This returns

    a  b
 1: a xx
 2: a yy
 3: a zz
 4: a zz
 5: a gg
 6: a tt
 7: b mm
 8: b nn
 9: c qq
10: c rr
11: c ss
12: c tt

Upvotes: 4

Rilgar17

Reputation: 181

There is another method, but this method involves another package : splitstackshape.

library(splitstackshape)
cSplit(dt, "b", sep = ",", direction = "long")
   a  b
1: a xx
2: a yy
3: a zz
4: b mm
5: b nn
6: c qq
7: c rr
8: c ss
9: c tt

This function uses data.table to work. And this work even if we have multiple same value for the column "a".

Upvotes: 1

akrun

Reputation: 887691

We can split the column 'b' by the delimiter ',' (using strsplit), grouped by 'a' and set the name of the new column i.e. 'V1' to 'b' with setnames

setnames(dt[, strsplit(b, ','), by = a], "V1", "b")[]
#   a  b
#1: a xx
#2: a yy
#3: a zz
#4: b mm
#5: b nn
#6: c qq
#7: c rr
#8: c ss
#9: c tt

If there are repeating elements in 'a' as in the below example

dt <- data.table(a = c('a','b','c', 'a'),
            b = c('xx,yy,zz','mm,nn','qq,rr,ss,tt', 'zz,gg,tt'))

we can group by the sequence of rows, do the strsplit on 'b', concatenate with the 'a' column and assign (:=) the 'grp' to NULL

dt[, c(a=a, b=strsplit(b, ",")), .(grp = 1:nrow(dt))][, grp := NULL][]
#    a  b
# 1: a xx
# 2: a yy
# 3: a zz
# 4: b mm
# 5: b nn
# 6: c qq
# 7: c rr
# 8: c ss
# 9: c tt
#10: a zz
#11: a gg
#12: a tt

NOTE: Both the methods are data.table methods

Upvotes: -2

melt data table and split values

Answers (3)

Related Questions