keeping leading zeros with sqldf

Question

I am a total SQL ignoramus so I apologize if this is very simple..

I have data that contains an ID column consisting of numbers, and in many cases contains leading zeros. I would like to import the data using sqldf, but in doing so I lose the leading zeros for these. Is there a way to keep the leading zeros? Maybe by somehow specifying that all columns are character classes like in R's read.table?

I can't share my data due to the nature of my work, but I am doing something like this:

a <- formatC(sample(1:99, 10), width = 8, format = "d", flag = "0")
fakeDF <- data.frame(v1=a, v2=rnorm(10, 0, 1))
f1 <- tempfile()
write.table(fakeDF, file=f1, quote=FALSE, row.names=FALSE, col.names=FALSE, sep="|")

f2 <- file(f1)
mydat <- sqldf::sqldf("SELECT * FROM f2", dbname=tempfile(), 
                      file.format=list(header=FALSE, sep="|", eol="
", skip=1))
mydat

Also, I would like to add that the length is not the same for all of these IDs. If possible, I would like to avoid having to manually pad the data with zeros after the fact..

G. Grothendieck · Accepted Answer

Use colClasses like this:

library(sqldf)

read.csv.sql(f1, header = FALSE, sep = "|", colClasses = c("character", "numeric"))

giving:

        V1         V2
1  00000029  1.7150650
2  00000078  0.4609162
3  00000040 -1.2650612
4  00000085 -0.6868529
5  00000090 -0.4456620
6  00000005  1.2240818
7  00000050  0.3598138
8  00000083  0.4007715
9  00000051  0.1106827
10 00000042 -0.5558411

Note: We used the input file generated using this random seed:

set.seed(123)

a <- formatC(sample(1:99, 10), width = 8, format = "d", flag = "0")
fakeDF <- data.frame(v1=a, v2=rnorm(10, 0, 1))
f1 <- tempfile()
write.table(fakeDF, file=f1, quote=FALSE, row.names=FALSE, col.names=FALSE, sep="|")

keeping leading zeros with sqldf

Answers (2)

Related Questions