Reputation: 105
Assume I have very big TSV file with over 20M lines that look like this:
a b {"condition1":["ABC"], "condition3":false, "condition4":4000}
c c {"condition1":["BBB"],"condition2":true}
I need it to look like:
Var1 Var2 Condition1 Condition2 Condition3 Condition4
a b ABC NA FALSE 4000
c c BBB TRUE NA NA
I tried the following code, but it is: a. inefficient b. does not work
anyway out-of-the-shelf solution to separate the 3rd column while reading?
dt<-fread(input = ifilename, header = T,encoding = "UTF-8" )
output<-dt[,c("filter")] #assume the third column named "filter"
fwrite(x = output,file = "./DB/filter.csv",)
filter.db<-fread(input ="./DB/filter.csv",fill=T)
Upvotes: 1
Views: 277
Reputation: 7784
*nix tools may be faster in this case, as the json parsers in R are a bit slow in my testing.
> library(data.table)
> aTbl = fread(cmd="cat foo.txt | grep -P -o '^\\w+\\s+\\w+'", header=F)
> aTbl
V1 V2
1: a b
2: c c
> bTbl = fread(cmd="cat foo.txt | grep -P -o '[{].*$' | jq -r '[ .condition1[], .condition2, .condition3, .condition4 ] | @csv'", header=F)
> bTbl
V1 V2 V3 V4
1: ABC NA FALSE 4000
2: BBB TRUE NA NA
> setnames(aTbl, c('Var1', 'Var2'))
> setnames(bTbl, c('Condition1', 'Condition2', 'Condition3', 'Condition4'))
> cTbl = cbind(aTbl, bTbl)
> cTbl
Var1 Var2 Condition1 Condition2 Condition3 Condition4
1: a b ABC NA FALSE 4000
2: c c BBB TRUE NA NA
>
Upvotes: 0
Reputation: 83255
A possible solution:
library(data.table)
library(jsonlite)
to_add <- rbindlist(lapply(dt$V3, function(x) setDT(fromJSON(x))), fill = TRUE)
setcolorder(to_add, sort(names(to_add)))
dt[, names(to_add) := to_add][, V3 := NULL][]
which gives:
V1 V2 condition1 condition2 condition3 condition4 1: a b ABC NA FALSE 4000 2: c c BBB TRUE NA NA
Used data:
dt <- structure(list(V1 = c("a", "c"),
V2 = c("b", "c"),
V3 = c("{\"condition1\":[\"ABC\"], \"condition3\":false, \"condition4\":4000}",
"{\"condition1\":[\"BBB\"],\"condition2\":true}")),
.Names = c("V1", "V2", "V3"), row.names = c(NA, -2L), class = c("data.table", "data.frame"))
Upvotes: 1