Reputation: 410
I'm trying to "automatically" convert a data.frame column to multiple columns.
Here's what the df looks like :
library(dplyr)
foo <- data_frame(ID = c(1,2),
Val = c("A", "B"),
Geom = c("[{X11,Y11,Z11}, {X12,Y12,Z12}, {X13,Y13,Z13}]", "[{X21,Y21,Z21},{X22,Y22,Z22},{X23,Y23,Z23}]"))
Here's what I would like it to look like :
bar <- data_frame(ID = c(1,1,1,2,2,2),
Val=c("A", "A", "A", "B", "B", "B"),
Geom1 = c("X11", "X12", "X13", "X21", "X22", "X23"),
Geom2 = c("Y11", "Y12", "Y13", "Y21", "Y22", "Y23"),
Geom3 = c("Z11", "Z12", "Z13", "Z21", "Z22", "Z23"))
The workflow I consider for such a transformation consists of 2 parts :
1 - Convert Geom to a R structure, like :
list(c("X11","Y11","Z11"), c(...), ...)
2 - Use tidyr::unnest()
or tidyr::separate()
to split such list to columns
I think I can handle the second part, but can't find the good pointer for the first. I could write this column to a csv and read it automagically after, but considering my data.frame will be a shiny reactive object, that would involve a lot of writing/reading.
I tried to use fromJSON()
(jsonlite, rjson and RJSONIO), but as this isn't valid json-string, it doesn't parse it.
Upvotes: 1
Views: 283
Reputation: 269644
1) dplyr This splits the data frame into rows and for each such row uses gsub
to split each triple onto a separate line and read.table
to parse Geom
further. Then it fixes the column names and does an ungroup
. (The setNames
line could be omitted if V1, V2 and V3 are OK instead of Geom1, Geom2 and Geom3.)
library(dplyr)
foo %>%
group_by(ID, Val) %>%
do(read.table(text=gsub("^..|..$|}, *{", "\n", .$Geom, perl=T), sep=",", as.is=T)) %>%
setNames(sub("^V(\\d+)", "Geom\\1", colnames(.))) %>%
ungroup()
giving:
Source: local data frame [6 x 5]
ID Val Geom1 Geom2 Geom3
(dbl) (chr) (chr) (chr) (chr)
1 1 A X11 Y11 Z11
2 1 A X12 Y12 Z12
3 1 A X13 Y13 Z13
4 2 B X21 Y21 Z21
5 2 B X22 Y22 Z22
6 2 B X23 Y23 Z23
2) No packages This uses the same approach but without any packages. The last line of code could be omitted if V1, V2, V3 are OK instead of Geom1, Geom2 and Geom3.
bar <- do.call("rbind", by(foo, foo$ID, function(x)
cbind(x[1:2], read.table(text = gsub("^..|..$|}, *{", "\n", x$Geom, perl=T), sep=","))))
names(bar) <- sub("^V(\\d+)", "Geom\\1", names(bar))
giving:
> bar
ID Val Geom1 Geom2 Geom3
1.1 1 A X11 Y11 Z11
1.2 1 A X12 Y12 Z12
1.3 1 A X13 Y13 Z13
2.1 2 B X21 Y21 Z21
2.2 2 B X22 Y22 Z22
2.3 2 B X23 Y23 Z23
Upvotes: 1
Reputation: 38500
Here is one method using base R:
# vector to work with
geom <- c("[{X11,Y11,Z11}, {X12,Y12,Z12}, {X13,Y13,Z13}]", "[{X21,Y21,Z21},{X22,Y22,Z22},{X23,Y23,Z23}]")
# remove extraneous characters and split into list using "},"
geom <- strsplit(gsub("[]{ []", "", Geom), split="},")
# remove two "}"s
geom <- sapply(geom, function(i) gsub("}", "", i))
# make a list of elements
geom <- strsplit(geom, split=",")
# construct the variables
geomData <- data.frame(t(sapply(geom, function(i) sapply(1:3, function(row) c(i[row])))))
# give names to data frame
names(geomData) <- c("Geom1", "Geom2", "Geom3")
# final data.frame
fooNew <- cbind(foo[, 1:2], geomData)
Upvotes: 2
Reputation: 31171
A solution with data.table
/splitstackshape
:
library(data.table)
library(splitstackshape)
dt = setDT(foo)[,strsplit(gsub('\\[{|}\\]','', Geom, perl=T), '}, *{', perl=T), .(ID, Val)]
cSplit(dt, 'V1')
# ID Val V1_1 V1_2 V1_3
#1: 1 A X11 Y11 Z11
#2: 1 A X12 Y12 Z12
#3: 1 A X13 Y13 Z13
#4: 2 B X21 Y21 Z21
#5: 2 B X22 Y22 Z22
#6: 2 B X23 Y23 Z23
Upvotes: 4