Reputation: 8824
I have a large file that looks like this
region type coeff p-value distance count
82365593523656436 A -0.9494 0.050 -16479472.5 8
82365593523656436 B 0.47303 0.526 57815363.0 8
82365593523656436 C -0.8938 0.106 42848210.5 8
When I read it in using fread, suddenly 82365593523656436 is not found anymore
correlations <- data.frame(fread('all_to_all_correlations.txt'))
> "82365593523656436" %in% correlations$region
[1] FALSE
I can find a slightly different number
> "82365593523656432" %in% correlations$region
[1] TRUE
but this number is not in the actual file
grep 82365593523656432 all_to_all_correlations.txt
gives no results, while
grep 82365593523656436 all_to_all_correlations.txt
does.
When I try to read in the small sample file I showed above instead of the full file I get
Warning message:
In fread("test.txt") :
Some columns have been read as type 'integer64' but package bit64 isn't loaded.
Those columns will display as strange looking floating point data.
There is no need to reload the data.
Just require(bit64) toobtain the integer64 print method and print the data again.
and the data looks like
region type coeff p.value distance count
1 3.758823e-303 A -0.94940 0.050 -16479472 8
2 3.758823e-303 B 0.47303 0.526 57815363 8
3 3.758823e-303 C -0.89380 0.106 42848210 8
So I think during reading 82365593523656436 was changed into 82365593523656432. How can I prevent this from happening?
Upvotes: 1
Views: 1784
Reputation: 132706
IDs (and that's apparently what the first column is) should usually be read as characters:
correlations <- setDF(fread('region type coeff p-value distance count
82365593523656436 A -0.9494 0.050 -16479472.5 8
82365593523656436 B 0.47303 0.526 57815363.0 8
82365593523656436 C -0.8938 0.106 42848210.5 8',
colClasses = c(region = "character")))
str(correlations)
#'data.frame': 3 obs. of 6 variables:
# $ region : chr "82365593523656436" "82365593523656436" "82365593523656436"
# $ type : chr "A" "B" "C"
# $ coeff : num -0.949 0.473 -0.894
# $ p-value : num 0.05 0.526 0.106
# $ distance: num -16479473 57815363 42848211
# $ count : int 8 8 8
Upvotes: 1