Tim.Lucas
Tim.Lucas

Reputation: 271

R not importing csv file correctly

I have a strange problem with R. It does not import a csv file correctly that I am exporting from Excel. I have the following csv file (I checked that the text format was the same as the cell values in Excel):

REGION;TYPE;CODE;BILL
A;X;871685920001760387;003007614504
B;Y ;871685920001765726;003007638434
C;Z;871685920001804326;003211001858

The above are the contents of my csv file. I saved it as "Example.csv". Now I want to import this file into R:

Ex <- read.csv2("Example.csv", header = TRUE, sep = ";")

Now, I specifically want to check that the CODE column matches, for I need these values to compare them against some files that I have stored elsewhere. However, when I compare these files to the tekst file (and the cell values in Excel), using options(digits = 19);

Ex$CODE
[1] 871685920001760384 871685920001765760 871685920001804288

As you can see, these values do not match at all! Trying as.character() gives the same results:

as.character(Ex$CODE)
[1] "871685920001760384" "871685920001765760" "871685920001804288"

Does anyone know how to fix this problem? I also tried stringsAsFactors = FALSE which did not work.

Thanks in advance!

Upvotes: 5

Views: 10264

Answers (2)

James
James

Reputation: 66874

@JakeBurkhead gave the solution, but the reason why this is happening is because read.csv by default will interpret the values as numeric. numeric values are constrained by the rules of floating point arithmetic, particularly that of doubles.

This is how R interprets the value:

print(871685920001760387,digits=18)
[1] 871685920001760384

Doubles carry 53 bits of precision for a number, which is a little less than 10^16. Your number is almost 10^18, and therefore it cannot be represented exactly down to the unit level.

Upvotes: 6

Jake Burkhead
Jake Burkhead

Reputation: 6545

You can read them all in as characters by setting colClasses.

 > Ex = read.table("Example.csv", sep  = ";", header = TRUE, colClasses = "character")
 > Ex
   REGION TYPE               CODE         BILL
 1      A    X 871685920001760387 003007614504
 2      B   Y  871685920001765726 003007638434
 3      C    Z 871685920001804326 003211001858
!> sapply(Ex, class)
      REGION        TYPE        CODE        BILL
 "character" "character" "character" "character"

Upvotes: 8

Related Questions