nasia jaffri
nasia jaffri

Reputation: 823

Reading and Writing .TSV files in R

I have to do some analysis on a .tsv file for a project, and I am fairly new to R. I am having a problem when reading/writing a .tsv file in R. It seems like problem occurs when there is a quote ("") in the row.

Example of a few records in the original file are below:

org_id    org_name        description                    created at     
5762      Artifice        Artifice \comes from Latin     4/3/2014 19:42
1045      Access Dar      Microsoft "Nasdaq worldwide    7/4/2014 10:34
345       Living Asset    Lincoln Park Zoo               11/3/2014 19:42
2356      Adler Planet    Mission of black cat           12/2/2014 11:03

I am reading the file with the following code line:

orgs <- read.delim("C:/Users/orgs.tsv", header=TRUE)

After renaming the columns, I write the file using the code below:

write.table(orgs, file = "C:/Users/orgs_updated.tsv", row.names=FALSE, sep="\t")

Now when I try to read this file (orgs_updated.tsv) in another program, it does not like when there are quotes in any of the columns. I am reading the file again using the code below:

orgs_updated <- read.delim("C:/Users/orgs_updated.tsv", sep="", header=TRUE, quote="\"")

and the file is being read like this, i.e. being read wrong, and adding a wrong row.

org_id    name        description                    created at     
5762      Artifice        Artifice \comes from Latin     4/3/2014 19:42
1045      Access Dar      Microsoft                      Nasdaq worldwide    
7/4/2014 10:34
345       Living Asset    Lincoln Park Zoo               11/3/2014 19:42
2356      Adler Planet    Mission of black cat           12/2/2014 11:03

I am not sure what am I doing wrong. I tried:

using the quote=FALSE option in write.table, 
not using quote option in the 2nd read.delim
changing sep = "" to sep ="\t"

but was not able to figure out the solution.

I will appreciate if someone can please help!!

Upvotes: 7

Views: 30676

Answers (1)

Myles Baker
Myles Baker

Reputation: 3760

Try loading the file with the following (I created the file on my machine with comma-delimited instead of tab):

orgs <- read.delim("orgs.tsv", header=TRUE, allowEscapes=FALSE, sep=",",  quote="", na.strings="", comment.char="")
write.table(orgs, file = "orgs_updated.tsv", row.names=FALSE, sep="\t")
orgs_updated <- read.delim("orgs_updated.tsv", sep="", header=TRUE, quote="\"")

orgs_updated
  org_id     org_name                 description      created.at
1   5762     Artifice Artifice \\comes from Latin  4/3/2014 19:42
2   1045   Access Dar Microsoft "Nasdaq worldwide  7/4/2014 10:34
3    345 Living Asset            Lincoln Park Zoo 11/3/2014 19:42
4   2356 Adler Planet        Mission of black cat 12/2/2014 11:03

Upvotes: 6

Related Questions