Jason Green
Jason Green

Reputation: 171

Issue when importing float as string from Excel. Adding precision incorrectly

Using openxlsx read.xlsx to import a dataframe from a multi-class column. The desired result is to import all values as strings, exactly as they're represented in Excel. However, some decimals are represented as very long floats.

Sample data is simply an Excel file with a column containing the following rows:

abc123, 
556.1, 
556.12, 
556.123, 
556.1234, 
556.12345 

require(openxlsx)
df <- read.xlsx('testnumbers.xlsx', )   

Using the above R code to read the file results in df containing these string

values:
abc123, 
556.1, 
556.12,
556.12300000000005, 
556.12339999999995, 
556.12345000000005

The Excel file provided in production has the column formatted as "General". If I format the column as Text, there is no change unless I explicitly double-click each cell in Excel and hit enter. In that case, the number is correctly displayed as a string. Unfortunately, clicking each cell isn't an option in the production environment. Any solution, Excel, R, or otherwise is appreciated.

*Edit: I've read through this question and believe I understand the math behind what's going on. At this point, I suppose I'm looking for a workaround. How can I get a float from Excel to an R dataframe as text without changing the representation?

Why Are Floating Point Numbers Inaccurate?

Upvotes: 3

Views: 1883

Answers (2)

Matt L.
Matt L.

Reputation: 2964

Here is a workaround in R using openxlsx that I used to solve a similar issue. I think it will solve your question, or at least allow you to format as text in the excel files programmatically.

I will use it to reformat specific cells in a large number of files (I'm converting from general to 'scientific' in my case- as an example of how you might alter this for another format). This uses functions in the openxlsx package that you reference in the OP

First, load the xlsx file in as a workbook (stored in memory, which preserves all the xlsx formatting/etc; slightly different than the method shown in the question, which pulls in only the data):

testnumbers <- loadWorkbook(here::here("test_data/testnumbers.xlsx"))

Then create a "style" to apply which converts the numbers to "text" and apply it to the virtual worksheet (in memory).

numbersAsText <- createStyle(numFmt = "TEXT")
addStyle(testnumbers, sheet = "Sheet1", style = numbersAsText, cols = 1, rows = 1:10)

finally, save it back to the original file:

saveWorkbook(testnumbers, 
               file = here::here("test_data/testnumbers_formatted.xlsx"),
               overwrite = T)

When you open the excel file, the numbers will be stored as "text"

Upvotes: 1

Jason Green
Jason Green

Reputation: 171

I was able to get the correct formats into a data frame using pandas in python.

import pandas as pd
test = pd.read_excel('testnumbers.xlsx', dtype = str)

This will suffice as a workaround, but I'd like to see a solution built in R.

Upvotes: 1

Related Questions