Reputation: 411
I have a column in a data table which is a list of comma separated values
dt = data.table( a = c('a','b','c'), b = c('xx,yy,zz','mm,nn','qq,rr,ss,tt'))
> dt
a b
1: a xx,yy,zz
2: b mm,nn
3: c qq,rr,ss,tt
I would like to transform it into a long format
a b
1: a xx
2: a yy
3: a zz
4: b mm
5: b nn
6: c qq
7: c rr
8: c ss
9: c tt
This question has been answered for a data frame here. I'm wondering if there is an elegant data table solution.
Upvotes: 0
Views: 832
Reputation: 38510
The following will work for your example:
dt[, c(b=strsplit(b, ",")), by=a]
a b
1: a xx
2: a yy
3: a zz
4: b mm
5: b nn
6: c qq
7: c rr
8: c ss
9: c tt
This method fails if the "by" variable is repeated as in
dt = data.table(a = c('a','b','c', 'a'),
b = c('xx,yy,zz','mm,nn','qq,rr,ss,tt', 'zz,gg,tt'))
One robust solution in this situation can be had by using paste
to collapse all observations with the same grouping variable (a) and feeding the result to the code above.
dt[, .(b=paste(b, collapse=",")), by=a][, c(b=strsplit(b, ",")), by=a]
This returns
a b
1: a xx
2: a yy
3: a zz
4: a zz
5: a gg
6: a tt
7: b mm
8: b nn
9: c qq
10: c rr
11: c ss
12: c tt
Upvotes: 4
Reputation: 181
There is another method, but this method involves another package : splitstackshape.
library(splitstackshape)
cSplit(dt, "b", sep = ",", direction = "long")
a b
1: a xx
2: a yy
3: a zz
4: b mm
5: b nn
6: c qq
7: c rr
8: c ss
9: c tt
This function uses data.table to work. And this work even if we have multiple same value for the column "a".
Upvotes: 1
Reputation: 887691
We can split
the column 'b' by the delimiter ',' (using strsplit
), grouped by 'a' and set the name of the new column i.e. 'V1' to 'b' with setnames
setnames(dt[, strsplit(b, ','), by = a], "V1", "b")[]
# a b
#1: a xx
#2: a yy
#3: a zz
#4: b mm
#5: b nn
#6: c qq
#7: c rr
#8: c ss
#9: c tt
If there are repeating elements in 'a' as in the below example
dt <- data.table(a = c('a','b','c', 'a'),
b = c('xx,yy,zz','mm,nn','qq,rr,ss,tt', 'zz,gg,tt'))
we can group by the sequence of rows, do the strsplit
on 'b', concatenate with the 'a' column and assign (:=
) the 'grp' to NULL
dt[, c(a=a, b=strsplit(b, ",")), .(grp = 1:nrow(dt))][, grp := NULL][]
# a b
# 1: a xx
# 2: a yy
# 3: a zz
# 4: b mm
# 5: b nn
# 6: c qq
# 7: c rr
# 8: c ss
# 9: c tt
#10: a zz
#11: a gg
#12: a tt
NOTE: Both the methods are data.table
methods
Upvotes: -2