Reputation: 35
I have a csv file imported from a sql database, one of the column is made of JSON data. I used to be able to read_csv the file and then parse the JSON data using fromJSON. However, now when I read the file the double quotations marks around the JSON fields are automatically removed and I can't find a way to prevent that from happening.
Here is an example. CSV file looks like this (3 columns, 4 rows):
A,B,C
1,2,"[{"C1":"C6.bmp","C2": "C5.bmp"}]"
3,4,"[{"C1":"C6.bmp","C2": "C5.bmp"}]"
5,6,"[{"C1":"C6.bmp","C2": "C5.bmp"}]"
7,8,"[{"C1":"C6.bmp","C2": "C5.bmp"}]"
json_data<-read_csv(file="test.csv")
produces
> json_data
# A tibble: 4 × 3
A B C
<dbl> <dbl> <chr>
1 2 [{C1:C6.bmp,C2: C5.bmp}]
3 4 [{C1:C6.bmp,C2: C5.bmp}]
5 6 [{C1:C6.bmp,C2: C5.bmp}]
7 8 [{C1:C6.bmp,C2: C5.bmp}]
instead of the expected:
> json_data
# A tibble: 4 × 3
A B C
<dbl> <dbl> <chr>
1 2 [{"C1":"C6.bmp","C2": "C5.bmp"}]
3 4 [{"C1":"C6.bmp","C2": "C5.bmp"}]
5 6 [{"C1":"C6.bmp","C2": "C5.bmp"}]
7 8 [{"C1":"C6.bmp","C2": "C5.bmp"}]
Without quotation marks column C cannot be passed on to fromJSON...
Upvotes: 0
Views: 129
Reputation: 270120
1) sub Read in the csv file, replace the first and last double quote with single quotes and then read in the file specifying single quote as the quote character. Single quote can be replaced with any character that does not occur in the file.
"test.csv" |>
readLines() |>
sub('"(.*)"', "'\\1'", x = _) |>
read.csv(text = _, quote = "'")
## A B C
## 1 1 2 [{"C1":"C6.bmp","C2": "C5.bmp"}]
## 2 3 4 [{"C1":"C6.bmp","C2": "C5.bmp"}]
## 3 5 6 [{"C1":"C6.bmp","C2": "C5.bmp"}]
## 4 7 8 [{"C1":"C6.bmp","C2": "C5.bmp"}]
2) sed Alternately we could preprocess the file outside of R using sed and pipe that to read.csv
.
Below we have assumed
We used @ instead single quote to avoid some problems with its interpretation on the cmd line. Any character not in the file will do as long as it does not have a special interpretation by the cmd line processor.
If using Linux sed is included but the escaping may need to be modified slightly.
read.csv(pipe('sed -e "s/\\"/@/" -e "s/\\"$/@/" test.csv'), quote = "@")
3) read.fwf This only works if the fields are in the same positions in every line. This is the case in the question's example.
read.fwf("test.csv", c(1, 1, 1, 2, 32))[c(1, 3, 5)]
Generate input test.csv:
Lines <- 'A,B,C
1,2,"[{"C1":"C6.bmp","C2": "C5.bmp"}]"
3,4,"[{"C1":"C6.bmp","C2": "C5.bmp"}]"
5,6,"[{"C1":"C6.bmp","C2": "C5.bmp"}]"
7,8,"[{"C1":"C6.bmp","C2": "C5.bmp"}]"'
writeLines(Lines, "test.csv")
Upvotes: 0
Reputation: 73592
Using readLines()
, then gsub
commas between word characters to something e.g. an asterisk and strsplit
at it.
> s <- gsub(readLines("foo.csv"), pat="(?<=\\w),", rep="*", perl=TRUE) |>
+ strsplit('\\*')
> do.call('rbind.data.frame', s[-1]) |> setNames(s[[1]])
A B C
1 1 2 "[{"C1":"C6.bmp","C2": "C5.bmp"}]"
2 3 4 "[{"C1":"C6.bmp","C2": "C5.bmp"}]"
3 5 6 "[{"C1":"C6.bmp","C2": "C5.bmp"}]"
4 7 8 "[{"C1":"C6.bmp","C2": "C5.bmp"}]"
Upvotes: 0
Reputation: 44977
If you can control how that file is created, you could get it to use single quotes around the JSON strings, i.e.
A,B,C
1,2,'[{"C1":"C6.bmp","C2": "C5.bmp"}]'
3,4,'[{"C1":"C6.bmp","C2": "C5.bmp"}]'
5,6,'[{"C1":"C6.bmp","C2": "C5.bmp"}]'
7,8,'[{"C1":"C6.bmp","C2": "C5.bmp"}]'
and then read.csv
or read_csv
will read it properly as long as you specify quote = "'"
:
> read.csv("~/temp/test.csv", quote="'")
A B C
1 1 2 [{"C1":"C6.bmp","C2": "C5.bmp"}]
2 3 4 [{"C1":"C6.bmp","C2": "C5.bmp"}]
3 5 6 [{"C1":"C6.bmp","C2": "C5.bmp"}]
4 7 8 [{"C1":"C6.bmp","C2": "C5.bmp"}]
If you can't control that, you might be able to edit the file to look like that; in your sample, the only place "[
appears is at the start of the JSON, and the only place ]"
appears is at the end. If this is also true for your real data, then you could make that edit pretty easily.
If that's not feasible, then you could read it with quote = ""
, which will leave all the quotes in place. The trouble with this is that the comma in the JSON will be taken to be a column separator, so you'll have a lot of fixups to do afterwards.
Upvotes: 0