I am importing an excel file into R, however, some of the values are changing from the original value by adding a significant number of decimal places. i.e The original value may be 77.21 but is changed to 72.209999999999994. I figure it has something to do with floating point numbers? I know this example would round back to 72.21, however, sometimes it is occurring on much smaller numbers. I also need to ensure the reported value in the excel sheet is what is getting imported. Does anyone have ideas how to manage this? Here is a sample of how the data is stored in excel sample Ni Co Al2O3 Fe2O3 S etc.... 84564 <0.005 <0.005 2.65 77.21 0.052 84107 <0.005 <0.005 2.64 77.18 0.051 .... the "<" less than indicated below detection and is handled else where. my code is : data <- read_excel(file.path) what I get in R is sample Ni Co Al2O3 Fe2O3 S 84564 <0.005 <0.005 2.6499999999999999 77.209999999999994 0.051999999999999998 84107 <0.005 <0.005 2.6400000000000001 77.180000000000007 0.050999999999999997 Some fields need to be round 2 decimals places and some 3 or 4. My main concern is that numbers may get rounded or modified beyond the detection limit.

Reputation: 606

R changing excel values during importing with read_excel

I am importing an excel file into R, however, some of the values are changing from the original value by adding a significant number of decimal places.

i.e The original value may be 77.21 but is changed to 72.209999999999994. I figure it has something to do with floating point numbers?
I know this example would round back to 72.21, however, sometimes it is occurring on much smaller numbers. I also need to ensure the reported value in the excel sheet is what is getting imported.
Does anyone have ideas how to manage this?

Here is a sample of how the data is stored in excel

sample Ni Co Al2O3 Fe2O3 S etc....
84564 <0.005 <0.005 2.65 77.21 0.052
84107 <0.005 <0.005 2.64 77.18 0.051

....

the "<" less than indicated below detection and is handled else where.

my code is : data <- read_excel(file.path)

what I get in R is

sample Ni Co Al2O3 Fe2O3 S
84564 <0.005 <0.005 2.6499999999999999 77.209999999999994 0.051999999999999998
84107 <0.005 <0.005 2.6400000000000001 77.180000000000007 0.050999999999999997

Some fields need to be round 2 decimals places and some 3 or 4. My main concern is that numbers may get rounded or modified beyond the detection limit.

Upvotes: 4

Answers (3)

Дмитрий Вайнер-Прима

Reputation: 1

I had the same issue after I added text values in data frame of Excel. After I removed such fields the problem gone. It is only first row of the table can be text. My solution was to use two excel files for import: One with only digits and second with text values.

Upvotes: 0

Roland

Reputation: 132989

2.65 cannot be represented exactly with floating point numbers:

sprintf("%.16f", 2.65)
#[1] "2.6499999999999999"

If you need higher precision (I seriously doubt that), you need to use a package for arbitrary precision numbers:

library(Rmpfr)
mpfr("2.65", 32)
#1 'mpfr' number of precision  32   bits 
#[1] 2.6500000004

mpfr("2.65", 64)
#1 'mpfr' number of precision  64   bits 
#[1] 2.65000000000000000009

mpfr("2.65", 128)
#1 'mpfr' number of precision  128   bits 
#[1] 2.650000000000000000000000000000000000005

R uses 53 bits (see help("is.double")):

mpfr("2.65", 53)
#1 'mpfr' number of precision  53   bits 
#[1] 2.6499999999999999

However, you need to consider that Excel also uses floating point numbers internally and can display a rounded representation [1]. My Excel version (2019) claims (in its help) to use 64 bits precision.

I'm extremely skeptical that you need to care about this. Using higher precision then the default is expensive (in developer time and computing resources) and if your workflow involves Excel I would be very surprised if you do something needing this. I have never needed it myself.

[1] It actually rounds for display by default. I have just tested that.

Upvotes: 3

anakar

Reputation: 356

Here's my solution with dplyr Would be great to get the actual data to use, but since we did not I just made some random numbers in an excel file:

Code


# STEP 1: IMPORT LIBRARIES:
library(readxl) # Needed for 'read_excel'
library(dplyr)  # Needed for manipulation

# Step 2: Import data as dataframe:
data.unformatted = as.data.frame(read_excel("C:/Users/A/Documents/scratchboard/sample numbers.xlsx"))

# Step 3: round the numbers to 2 digits:
data.only.2.dig = data.unformatted %>% 
  mutate_if(is.numeric, round, digits=2)

# Alternatively, step 2+3 in one line:
data = as.data.frame(
  read_excel(
    "C:/Users/A/Documents/scratchboard/sample numbers.xlsx")
) %>% 
  mutate_if(is.numeric, round, digits=2)

#OUTPUT:
# Notice that it only affects the Num column, which is the 6th
head(data.unformatted) 
head(data.only.2.dig)

The output:

 #OUTPUT:
# Notice that it only affects the Num column, which is the 6th
> head(data.unformatted)
  number      city      dates classes     lt       Num
1      1     Boise 2020-01-01      52  solid 0.7973496
2      2     Boise 2020-02-01      36  solid 0.2830991
3      3     Boise 2020-03-01      69  solid 0.1539214
4      4     Boise 2020-04-01     100  solid 0.1153002
5      5     Boise 2020-05-01      72  solid 0.1745657
6      6 Pocatello 2020-01-01      82 dashed 0.2586195
> head(data.only.2.dig)
  number      city      dates classes     lt  Num
1      1     Boise 2020-01-01      52  solid 0.80
2      2     Boise 2020-02-01      36  solid 0.28
3      3     Boise 2020-03-01      69  solid 0.15
4      4     Boise 2020-04-01     100  solid 0.12
5      5     Boise 2020-05-01      72  solid 0.17
6      6 Pocatello 2020-01-01      82 dashed 0.26

Next time please share the actual data so we can reproduce your problem

Upvotes: 0

R changing excel values during importing with read_excel

Answers (3)

Code

The output:

Related Questions