Reputation: 52840
Notice that the R's base command read.csv works such that
read.csv(file=fileName, dec=".", sep=",", header=T)
whilefread
does not work in the following demo, having quoted separators inline. We concentrate now on the data.table and fread, read.csv too slow.
I have a csv file that has comma as a field value separators and dot as a decimal point. The file MyFile.csv
has field names with commas and dots, such as "Product.Apple.Green,Purple"
where double quotes try to separate dots and commas. However, the separation with double quotes does not work with the fread such that
Sys.setlocale('LC_NUMERIC', 'fi_FI.UTF-8')
fread(file="MyFile.csv", sep=",", dec=".")
so for example the fields
`CustomerID, ProductID, Prod.Sub, "Prod.Sub,feature", A.B.C, "A,B,C,D"`
is read, with dash -
denoting field separation now, as
`CustomerID - ProductID - Prod.Sub - "Prod.Sub - feature" - A.B.C - "A - B - C - D"`
where "Prod.Sub, feature"
is wrongly read as two fields "Prod.Sub - feature"
and "A,B,C,D"
is wrongly read as "A - B - C - D"
.
How can I inline escape separators with data.table's fread?
Upvotes: 0
Views: 840
Reputation: 1751
Cool thing about fread
, it can use system commands (at least on Unix).
So make use of sed
, for example:
fread("sed 's/,\ /| /g' MyFile.csv", sep = "|", dec = ".")
Replace |
with any separator that might not conflict.
Note: In your text example, the delimiter is shown as ,
+ (space). Without space the solution does not work!
Also: without space (only ,
), the quoted fields get processed by fread
correctly using data.table
version 1.10.0
Upvotes: 1