Reputation: 612
I have a tab separated data with a column containing addresses including commas in the addresses.
I am using read.table to import a data into R, however my colleague used read.csv with sep="\t" to do the same and we both end up with different number of rows in the imported data frame.
Also, when I import the data in Excel, I get the same number of records as read.csv with sep="\t".
What is the most concrete way i can verify which import and number of records is the correct one?
Please let me know what details I can add here to help answer the question.
Upvotes: 0
Views: 1842
Reputation: 4807
Read the help files for the two functions via ?read.table
(that'll show both). You'll see that read.csv
is just read.table
with some of the arguments set to different defaults.
One of those arguments is header
. In read.table
with sep="\t"
, try also using header=TRUE
.
If that doesn't work, do the following: read.table('file.txt', header=TRUE, sep="\t", quote="\"", dec=".", fill=TRUE, comment.char=""
. That code should give the exact same result as read.csv
, because I just set all the arguments to those used by read.csv
. You can then begin by changing some of those arguments back to the read.table
default (by not specifying them) to figure out which argument is causing the difference between read.csv
and read.table
for your data.frame (remember, more than one argument could be causing the difference). I can easily see ways that the header
, sep
, quote
, comment.char
, and fill
arguments could affect the number of rows in the output. I can't imagine how dec
would have this effect, but I wouldn't be surprised if it matters.
As a rule, I tend to expect that different input = different output, and when different input = same output, I consider that to be exceptional. The functions you're using are similar, but they're differences are different ways of interpreting the text file, so I would expect them to yield different results. Which is "right" is not a matter of one of the functions preforming correctly and the other incorrectly, it's a matter of the user understanding what they are doing in relation to the input.
Upvotes: 2