RobinCura
RobinCura

Reputation: 410

Convert python-like list to R nested vectors

I'm trying to "automatically" convert a data.frame column to multiple columns.

Here's what the df looks like :

library(dplyr)
foo <- data_frame(ID = c(1,2),
                  Val =  c("A", "B"),
                  Geom = c("[{X11,Y11,Z11}, {X12,Y12,Z12}, {X13,Y13,Z13}]", "[{X21,Y21,Z21},{X22,Y22,Z22},{X23,Y23,Z23}]"))

Here's what I would like it to look like :

bar <- data_frame(ID = c(1,1,1,2,2,2),
                  Val=c("A", "A", "A", "B", "B", "B"),
                  Geom1 = c("X11", "X12", "X13", "X21", "X22", "X23"),
                  Geom2 = c("Y11", "Y12", "Y13", "Y21", "Y22", "Y23"),
                  Geom3 = c("Z11", "Z12", "Z13", "Z21", "Z22", "Z23"))

The workflow I consider for such a transformation consists of 2 parts :

1 - Convert Geom to a R structure, like :

list(c("X11","Y11","Z11"), c(...), ...)

2 - Use tidyr::unnest() or tidyr::separate() to split such list to columns

I think I can handle the second part, but can't find the good pointer for the first. I could write this column to a csv and read it automagically after, but considering my data.frame will be a shiny reactive object, that would involve a lot of writing/reading.

I tried to use fromJSON() (jsonlite, rjson and RJSONIO), but as this isn't valid json-string, it doesn't parse it.

Upvotes: 1

Views: 283

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269644

1) dplyr This splits the data frame into rows and for each such row uses gsub to split each triple onto a separate line and read.table to parse Geom further. Then it fixes the column names and does an ungroup. (The setNames line could be omitted if V1, V2 and V3 are OK instead of Geom1, Geom2 and Geom3.)

library(dplyr)

foo %>% 
   group_by(ID, Val) %>% 
   do(read.table(text=gsub("^..|..$|}, *{", "\n", .$Geom, perl=T), sep=",", as.is=T)) %>% 
   setNames(sub("^V(\\d+)", "Geom\\1", colnames(.))) %>%
   ungroup()

giving:

Source: local data frame [6 x 5]

     ID   Val Geom1 Geom2 Geom3
  (dbl) (chr) (chr) (chr) (chr)
1     1     A   X11   Y11   Z11
2     1     A   X12   Y12   Z12
3     1     A   X13   Y13   Z13
4     2     B   X21   Y21   Z21
5     2     B   X22   Y22   Z22
6     2     B   X23   Y23   Z23

2) No packages This uses the same approach but without any packages. The last line of code could be omitted if V1, V2, V3 are OK instead of Geom1, Geom2 and Geom3.

bar <- do.call("rbind", by(foo, foo$ID, function(x) 
   cbind(x[1:2], read.table(text = gsub("^..|..$|}, *{", "\n", x$Geom, perl=T), sep=","))))
names(bar) <- sub("^V(\\d+)", "Geom\\1", names(bar))

giving:

> bar
    ID Val Geom1 Geom2 Geom3
1.1  1   A   X11   Y11   Z11
1.2  1   A   X12   Y12   Z12
1.3  1   A   X13   Y13   Z13
2.1  2   B   X21   Y21   Z21
2.2  2   B   X22   Y22   Z22
2.3  2   B   X23   Y23   Z23

Upvotes: 1

lmo
lmo

Reputation: 38500

Here is one method using base R:

# vector to work with
geom <- c("[{X11,Y11,Z11}, {X12,Y12,Z12}, {X13,Y13,Z13}]", "[{X21,Y21,Z21},{X22,Y22,Z22},{X23,Y23,Z23}]")
# remove extraneous characters and split into list using "},"
geom <- strsplit(gsub("[]{ []", "", Geom), split="},")
# remove two "}"s
geom <- sapply(geom, function(i) gsub("}", "", i))
# make a list of elements
geom <- strsplit(geom, split=",")

# construct the variables
geomData <- data.frame(t(sapply(geom, function(i) sapply(1:3, function(row) c(i[row])))))
# give names to data frame
names(geomData) <- c("Geom1", "Geom2", "Geom3")

# final data.frame
fooNew <- cbind(foo[, 1:2], geomData)

Upvotes: 2

Colonel Beauvel
Colonel Beauvel

Reputation: 31171

A solution with data.table/splitstackshape:

library(data.table)
library(splitstackshape)

dt = setDT(foo)[,strsplit(gsub('\\[{|}\\]','', Geom, perl=T), '}, *{', perl=T), .(ID, Val)]

cSplit(dt, 'V1')
#   ID Val V1_1 V1_2 V1_3
#1:  1   A  X11  Y11  Z11
#2:  1   A  X12  Y12  Z12
#3:  1   A  X13  Y13  Z13
#4:  2   B  X21  Y21  Z21
#5:  2   B  X22  Y22  Z22
#6:  2   B  X23  Y23  Z23

Upvotes: 4

Related Questions