pfoe
pfoe

Reputation: 3

write_csv - Exporting trailing spaces (no elimination)

I am trying to export a table to CSV format, but one of my columns is special - it's like a number string except that the length of the string needs to be the same every time, so I add trailing spaces to shorter numbers to get it to a certain length (in this case I make it length 5).

library(dplyr)
library(readr)

df <- read.table(text="ID   Something
22  Red
                 55555  Red
                 123    Blue
                 ",header=T)

df <- mutate(df,ID=str_pad(ID,5,"right"," "))

df

     ID Something
1 22          Red
2 55555       Red
3 123        Blue

Unfortunately, when I try to do write_csv somewhere, the trailing spaces disappear which is not good for what I want to use this for. I think it's because I am downloading the csv from the R server and then opening it in Excel, which messes around with the data. Any tips?

Upvotes: 0

Views: 397

Answers (1)

bmonger
bmonger

Reputation: 77

str_pad() appears to be a function from stringr package, which is not currently available for R 3.5.0 which I am using - this may be the cause of your issues as well. If it the function actually works for you, please ignore the next step and skip straight to my Excel comments below

Adding spaces. Here is how I have accomplished this task with base R

# a custom function to add arbitrary number of trailing spaces 
SpaceAdd <- function(x, desiredLength = 5) {
  additionalSpaces <- ifelse(nchar(x) < desiredLength, 
                             paste(rep(" ", desiredLength - nchar(x)), collapse = ""), "")
  paste(x, additionalSpaces, sep="")
}

# use the function on your df
df$ID <- mapply(df$ID, FUN = SpaceAdd)

# write csv normally
write.csv(df, "df.csv")

NOTE When you import to Excel, you should be using the 'import from text' wizard rather than just opening the .csv. This is because you need marking your 'ID' column as text in order to keep the spaces

NOTE 2 I have learned today, that having your first column named 'ID' might actually cause further problems with excel, since it may misinterpret the nature of the file, and treat it as SYLK file instead. So it may be best avoiding this column name if possible.

Here is a wiki tl;dr:

A commonly encountered (and spurious) 'occurrence' of the SYLK file happens when a comma-separated value (CSV) format is saved with an unquoted first field name of 'ID', that is the first two characters match the first two characters of the SYLK file format. Microsoft Excel (at least to Office 2016) will then emit misleading error messages relating to the format of the file, such as "The file you are trying to open, 'x.csv', is in a different format than specified by the file extension..."

details: https://en.wikipedia.org/wiki/SYmbolic_LinK_(SYLK)

Upvotes: 2

Related Questions