Reputation: 1829
I have a csv file with the following structure:
Input
{"eid":"START","ver":"3.0","ets":1514764800238}}
{"eid":"INTERACT","ver":"3.0","ets":1514764820546}}
{"eid":"IMPRESSION","ver":"3.0","ets":895732}}
{"eid":"IMPRESSION","ver":"3.0","ets":245636}}
{"eid":"INTERACT","ver":"3.0","ets":535235423525}}
As you can see, its not a valid json, for the above to be valid json, the structure should be the following:
Expected Output
[{"eid":"START","ver":"3.0","ets":1514764800238},
{"eid":"INTERACT","ver":"3.0","ets":1514764820546},
{"eid":"IMPRESSION","ver":"3.0","ets":895732},
{"eid":"IMPRESSION","ver":"3.0","ets":245636},
{"eid":"INTERACT","ver":"3.0","ets":535235423525}]
Problem:
I would like to ideally read the file and fix it and save as a JSON, that is
I tried using fromJSON(rjson), read_delim, but I could not read it.
Thanks in advance
Upvotes: 0
Views: 101
Reputation: 269586
Note that this question is nearly a duplicate of Extraction of different types of variables from a large list
Aside from reading it in and running fromJSON
(jsonlite package) a single line of base code can convert it to valid JSON (in variable json
).
"}}"
with "}"
using sub
on each input line,toString
and"["
and "]"
using c
Code:
library(jsonlite)
L <- readLines("test.json")
json <- c("[", toString(sub("}}", "}", L)), "]")
fromJSON(json)
giving:
eid ver ets
1 START 3.0 1.514765e+12
2 INTERACT 3.0 1.514765e+12
3 IMPRESSION 3.0 8.957320e+05
4 IMPRESSION 3.0 2.456360e+05
5 INTERACT 3.0 5.352354e+11
This could alternately be expressed as a pipeline giving the same output:
library(jsonlite)
library(magrittr)
"test.json" %>%
sub("}}", "}", .) %>%
toString %>%
c("[", ., "]") %>%
fromJSON
Test input is generated with the code below:
Lines <- c('{"eid":"START","ver":"3.0","ets":1514764800238}}',
'{"eid":"INTERACT","ver":"3.0","ets":1514764820546}}',
'{"eid":"IMPRESSION","ver":"3.0","ets":895732}}',
'{"eid":"IMPRESSION","ver":"3.0","ets":245636}}',
'{"eid":"INTERACT","ver":"3.0","ets":535235423525}}')
writeLines(Lines, "test.json")
Upvotes: 0
Reputation: 78792
Manual find/replace is a terrible, terrible, terrible suggestion for reproducible workflows.
One option — assuming there really is a }}
at the end of each line and the file is in /tmp/badlines
:
library(magrittr)
library(ndjson)
readLines("/tmp/badlines") %>%
sub("\\}$", "", .) %>%
ndjson::flatten(cls = "tbl")
## # A tibble: 5 x 3
## eid ets ver
## <chr> <dbl> <chr>
## 1 START 1.51e12 3.0
## 2 INTERACT 1.51e12 3.0
## 3 IMPRESSION 8.96e 5 3.0
## 4 IMPRESSION 2.46e 5 3.0
## 5 INTERACT 5.35e11 3.0
Upvotes: 2