hhh
hhh

Reputation: 52840

fread: how to espace separators inline in field values?

Notice that the R's base command read.csv works such that read.csv(file=fileName, dec=".", sep=",", header=T) while fread does not work in the following demo, having quoted separators inline. We concentrate now on the data.table and fread, read.csv too slow.


I have a csv file that has comma as a field value separators and dot as a decimal point. The file MyFile.csv has field names with commas and dots, such as "Product.Apple.Green,Purple" where double quotes try to separate dots and commas. However, the separation with double quotes does not work with the fread such that

Sys.setlocale('LC_NUMERIC', 'fi_FI.UTF-8')   
fread(file="MyFile.csv", sep=",", dec=".")

so for example the fields

`CustomerID, ProductID, Prod.Sub, "Prod.Sub,feature", A.B.C, "A,B,C,D"` 

is read, with dash - denoting field separation now, as

`CustomerID - ProductID - Prod.Sub - "Prod.Sub - feature" - A.B.C - "A - B - C - D"` 

where "Prod.Sub, feature" is wrongly read as two fields "Prod.Sub - feature" and "A,B,C,D" is wrongly read as "A - B - C - D".

How can I inline escape separators with data.table's fread?

Upvotes: 0

Views: 840

Answers (1)

setempler
setempler

Reputation: 1751

Cool thing about fread, it can use system commands (at least on Unix).

So make use of sed, for example:

fread("sed 's/,\ /| /g' MyFile.csv", sep = "|", dec = ".")

Replace | with any separator that might not conflict.

Note: In your text example, the delimiter is shown as , + (space). Without space the solution does not work!

Also: without space (only ,), the quoted fields get processed by fread correctly using data.table version 1.10.0

Upvotes: 1

Related Questions