Reputation: 606
I am importing an excel file into R, however, some of the values are changing from the original value by adding a significant number of decimal places.
i.e The original value may be 77.21 but is changed to 72.209999999999994. I figure it has something to do with floating point numbers?
I know this example would round back to 72.21, however, sometimes it is occurring on much smaller numbers.
I also need to ensure the reported value in the excel sheet is what is getting imported.
Does anyone have ideas how to manage this?
Here is a sample of how the data is stored in excel
....
the "<" less than indicated below detection and is handled else where.
my code is : data <- read_excel(file.path)
what I get in R is
Some fields need to be round 2 decimals places and some 3 or 4. My main concern is that numbers may get rounded or modified beyond the detection limit.
Upvotes: 4
Views: 4849
Reputation: 1
I had the same issue after I added text values in data frame of Excel. After I removed such fields the problem gone. It is only first row of the table can be text. My solution was to use two excel files for import: One with only digits and second with text values.
Upvotes: 0
Reputation: 132989
2.65 cannot be represented exactly with floating point numbers:
sprintf("%.16f", 2.65)
#[1] "2.6499999999999999"
If you need higher precision (I seriously doubt that), you need to use a package for arbitrary precision numbers:
library(Rmpfr)
mpfr("2.65", 32)
#1 'mpfr' number of precision 32 bits
#[1] 2.6500000004
mpfr("2.65", 64)
#1 'mpfr' number of precision 64 bits
#[1] 2.65000000000000000009
mpfr("2.65", 128)
#1 'mpfr' number of precision 128 bits
#[1] 2.650000000000000000000000000000000000005
R uses 53 bits (see help("is.double")
):
mpfr("2.65", 53)
#1 'mpfr' number of precision 53 bits
#[1] 2.6499999999999999
However, you need to consider that Excel also uses floating point numbers internally and can display a rounded representation [1]. My Excel version (2019) claims (in its help) to use 64 bits precision.
I'm extremely skeptical that you need to care about this. Using higher precision then the default is expensive (in developer time and computing resources) and if your workflow involves Excel I would be very surprised if you do something needing this. I have never needed it myself.
[1] It actually rounds for display by default. I have just tested that.
Upvotes: 3
Reputation: 356
Here's my solution with dplyr
Would be great to get the actual data to use, but since we did not I just made some random numbers in an excel file:
# STEP 1: IMPORT LIBRARIES:
library(readxl) # Needed for 'read_excel'
library(dplyr) # Needed for manipulation
# Step 2: Import data as dataframe:
data.unformatted = as.data.frame(read_excel("C:/Users/A/Documents/scratchboard/sample numbers.xlsx"))
# Step 3: round the numbers to 2 digits:
data.only.2.dig = data.unformatted %>%
mutate_if(is.numeric, round, digits=2)
# Alternatively, step 2+3 in one line:
data = as.data.frame(
read_excel(
"C:/Users/A/Documents/scratchboard/sample numbers.xlsx")
) %>%
mutate_if(is.numeric, round, digits=2)
#OUTPUT:
# Notice that it only affects the Num column, which is the 6th
head(data.unformatted)
head(data.only.2.dig)
#OUTPUT:
# Notice that it only affects the Num column, which is the 6th
> head(data.unformatted)
number city dates classes lt Num
1 1 Boise 2020-01-01 52 solid 0.7973496
2 2 Boise 2020-02-01 36 solid 0.2830991
3 3 Boise 2020-03-01 69 solid 0.1539214
4 4 Boise 2020-04-01 100 solid 0.1153002
5 5 Boise 2020-05-01 72 solid 0.1745657
6 6 Pocatello 2020-01-01 82 dashed 0.2586195
> head(data.only.2.dig)
number city dates classes lt Num
1 1 Boise 2020-01-01 52 solid 0.80
2 2 Boise 2020-02-01 36 solid 0.28
3 3 Boise 2020-03-01 69 solid 0.15
4 4 Boise 2020-04-01 100 solid 0.12
5 5 Boise 2020-05-01 72 solid 0.17
6 6 Pocatello 2020-01-01 82 dashed 0.26
Next time please share the actual data so we can reproduce your problem
Upvotes: 0