Reputation: 823
I have to do some analysis on a .tsv file for a project, and I am fairly new to R. I am having a problem when reading/writing a .tsv file in R. It seems like problem occurs when there is a quote ("") in the row.
Example of a few records in the original file are below:
org_id org_name description created at
5762 Artifice Artifice \comes from Latin 4/3/2014 19:42
1045 Access Dar Microsoft "Nasdaq worldwide 7/4/2014 10:34
345 Living Asset Lincoln Park Zoo 11/3/2014 19:42
2356 Adler Planet Mission of black cat 12/2/2014 11:03
I am reading the file with the following code line:
orgs <- read.delim("C:/Users/orgs.tsv", header=TRUE)
After renaming the columns, I write the file using the code below:
write.table(orgs, file = "C:/Users/orgs_updated.tsv", row.names=FALSE, sep="\t")
Now when I try to read this file (orgs_updated.tsv) in another program, it does not like when there are quotes in any of the columns. I am reading the file again using the code below:
orgs_updated <- read.delim("C:/Users/orgs_updated.tsv", sep="", header=TRUE, quote="\"")
and the file is being read like this, i.e. being read wrong, and adding a wrong row.
org_id name description created at
5762 Artifice Artifice \comes from Latin 4/3/2014 19:42
1045 Access Dar Microsoft Nasdaq worldwide
7/4/2014 10:34
345 Living Asset Lincoln Park Zoo 11/3/2014 19:42
2356 Adler Planet Mission of black cat 12/2/2014 11:03
I am not sure what am I doing wrong. I tried:
using the quote=FALSE option in write.table,
not using quote option in the 2nd read.delim
changing sep = "" to sep ="\t"
but was not able to figure out the solution.
I will appreciate if someone can please help!!
Upvotes: 7
Views: 30676
Reputation: 3760
Try loading the file with the following (I created the file on my machine with comma-delimited instead of tab):
orgs <- read.delim("orgs.tsv", header=TRUE, allowEscapes=FALSE, sep=",", quote="", na.strings="", comment.char="")
write.table(orgs, file = "orgs_updated.tsv", row.names=FALSE, sep="\t")
orgs_updated <- read.delim("orgs_updated.tsv", sep="", header=TRUE, quote="\"")
orgs_updated
org_id org_name description created.at
1 5762 Artifice Artifice \\comes from Latin 4/3/2014 19:42
2 1045 Access Dar Microsoft "Nasdaq worldwide 7/4/2014 10:34
3 345 Living Asset Lincoln Park Zoo 11/3/2014 19:42
4 2356 Adler Planet Mission of black cat 12/2/2014 11:03
Upvotes: 6