James O'Donnell
James O'Donnell

Reputation: 85

Convert Number Stored as Text in Excel File Using R

I would like to output data from R to an excel file, and have been using the openxlsx package.

writeDataTable is the function I have been using. The problem is that I want the entries of one of the columns to be a mixture of numbers and text. As a column of a dataframe in R must have all the same type of entries, everything gets converted to character entries.

When I open the excel file that is outputted I get the little green triangle in the corner of the number cells, telling me that they are a number stored as text and giving me the option to convert to number.

I'm wondering is it possible to get R to convert this for me in the Workbook Object that is created before saving the file.

I've spent a lot of time googling the problem, looking for different libraries that might help, but haven't been able to find anything so far. I have an example below. Note that this isn't the exact data frame I am using, but is for demonstrative purposes.

df = data.frame(A = c('Dog', '5', '7.04'), B = c('Cat', '12', '1.23'))
wb = createWorkbook()
addWorksheet(wb, "Sheet2")
writeDataTable(wb, "Sheet2", df)
output_file = "C:\\Users\\johndoe\\documents\\excel_file.xlsx"
saveWorkbook(wb, output_file)

Any help would be much appreciated.

Upvotes: 8

Views: 4540

Answers (2)

Richard Gallardo
Richard Gallardo

Reputation: 1

A more scalable workaround can be done that is similar to one suggested by @MKa which involved modifying the dataframe values in R as a formula:

library(openxlsx)
df <- data.frame(A = c('Dog', '5', '7.04'), B = c('Cat', '12', '1.23'))
wb <- createWorkbook()
addWorksheet(wb, "Sheet2")

# Modify df vals to formula class
df$A <- ifelse(!is.na(as.numeric(df$A)), 
           df$A,
           paste0("=\"", df$A, "\""))
class(df$A) <- "formula"
df$B <- ifelse(!is.na(as.numeric(df$B)), 
           df$B,
           paste0("=\"", df$B, "\""))
class(df$B) <- "formula"

writeDataTable(wb, "Sheet2", df)
saveWorkbook(wb, 'excel_file.xlsx', overwrite = T)

Main caveat is issues reading the same file using openxlsx:

Formulae written using writeFormula to a Workbook object will not get picked up by read.xlsx(). This is because only the formula is written and left to be evaluated when the file is opened in Excel. Opening, saving and closing the file with Excel will resolve this.

There are solutions to read the file still using other libraries but may not be ideal:

Upvotes: 0

r2evans
r2evans

Reputation: 160942

Here's a way, but it will be painfully slow.

### unchanged
library(openxlsx)
df = data.frame(A = c('Dog', '5', '7.04'), B = c('Cat', '12', '1.23'))
wb = createWorkbook()
addWorksheet(wb, "Sheet2")
writeDataTable(wb, "Sheet2", df)

### this is the new part
for (cn in seq_len(ncol(df))) {
  for (rn in seq_len(nrow(df))) {
    if (!is.numeric(df[rn,cn]) && !is.na(val <- as.numeric(as.character(df[rn,cn])))) {
      writeData(wb, "Sheet2", val, startCol = cn, startRow = 1L + rn)
    }
  }
}

### unchanged
saveWorkbook(wb, output_file)

The catch is that this is writing one cell at a time.

Ways to improve this:

  • If you have more numbers than strings, you might want to reverse this a little by converting the relevant columns with as.numeric (producing a lot of NAs), and then one-by-one over-write a cell's NA value with the previous string.

  • You can look for runs of cells to replace (perhaps using rle within a column), which will write clump-by-clump instead of cell-by-cell.

Upvotes: 3

Related Questions