Dutt
Dutt

Reputation: 401

How to convert only cells with digits to numeric from character in a dataframe in R and save to an excel file

I want to convert only the cells that has digits to numeric and save it to an excel file.

For example, The dataframe is similar to this.

df <- data.frame(row1 = c('agri','1','-', NA),row2 = c('3','import','6', 'food'), stringsAsFactors = F)

After I save it to an excel file with

openxlsx::write.xlsx(df, "df.xlsx", row.names=FALSE)

the digits in the excel file are characters.

enter image description here

It is because in the dataframe these columns are 'char'.

> str(df)

'data.frame': 4 obs. of 2 variables:

$ row1: chr "agri" "1" "-" NA

$ row2: chr "3" "import" "6" "food"

I tried mutate() and also transform(). But those only works with the entire column.

Is it possible to do it in R?

Any help is much appreciated.

Upvotes: 0

Views: 1238

Answers (2)

denfly
denfly

Reputation: 13

A bit late response but I was also looking for this. I know exactly what you wanted (We want the numbers in the excel sheets to be numbers, so they can be sorted as numbers, and the text as characters), and there is a way described here Convert Number Stored as Text in Excel File Using R

It is painstakingly slow, and I am looking for a faster way to do it. So, if anyone is more experienced in this, please have a stab.

Upvotes: 0

Maurits Evers
Maurits Evers

Reputation: 50668

A late comment/response.

tldr; I don't think this is possible.

openxlsx does allow for cell-level formatting (via addStyle), and you could be tempted to use addStyle to define different formats for different cells à la

library(openxlsx)
wb <- createWorkbook()

addWorksheet(wb, "sheet1")
writeData(wb, sheet = 1, df)

addStyle(wb, sheet = 1, createStyle(numFmt = "TEXT"), row = 1, col = 1)
addStyle(wb, sheet = 1, createStyle(numFmt = "NUMBER"), row = 3, col = 1)

saveWorkbook(wb, "example.xlsx", overwrite = TRUE)

Inspecting the Excel file reveals that despite the fact that Excel reports cell (row=3, col=1) to have category "Number", the cell is in fact still of category "Text" (as indicated by the little warning triangle).

I can't say whether this is a bug with openxlsx or not.

In my personal opinion, this is for the better. Mixing different types of data in one column is bad practice, goes against the fundamentals of R's basic data structures (as entries of elements/columns of a list/data.frame must all have the same data type), and is the opposite of "tidy data".

Upvotes: 2

Related Questions