Reputation: 1355
I am a total SQL ignoramus so I apologize if this is very simple..
I have data that contains an ID column consisting of numbers, and in many cases contains leading zeros. I would like to import the data using sqldf, but in doing so I lose the leading zeros for these. Is there a way to keep the leading zeros? Maybe by somehow specifying that all columns are character classes like in R's read.table?
I can't share my data due to the nature of my work, but I am doing something like this:
a <- formatC(sample(1:99, 10), width = 8, format = "d", flag = "0")
fakeDF <- data.frame(v1=a, v2=rnorm(10, 0, 1))
f1 <- tempfile()
write.table(fakeDF, file=f1, quote=FALSE, row.names=FALSE, col.names=FALSE, sep="|")
f2 <- file(f1)
mydat <- sqldf::sqldf("SELECT * FROM f2", dbname=tempfile(),
file.format=list(header=FALSE, sep="|", eol="\n", skip=1))
mydat
Also, I would like to add that the length is not the same for all of these IDs. If possible, I would like to avoid having to manually pad the data with zeros after the fact..
Upvotes: 0
Views: 463
Reputation: 107652
One way to run leading zeros is using SQL string functions. Just impose an amount of zeros higher than your desired string length, concatenate with your actual ID field, and strip from the rightmost character the specified length of column you require. Below uses 8 characters as string length:
mydat <- sqldf::sqldf("select rightstr('0000000000000' || ID, 8) As LeadZeroID,
* from f2;",
dbname=tempfile(),
file.format=list(header=FALSE, sep="|", eol="\n", skip=1))
Upvotes: 0
Reputation: 269714
Use colClasses
like this:
library(sqldf)
read.csv.sql(f1, header = FALSE, sep = "|", colClasses = c("character", "numeric"))
giving:
V1 V2
1 00000029 1.7150650
2 00000078 0.4609162
3 00000040 -1.2650612
4 00000085 -0.6868529
5 00000090 -0.4456620
6 00000005 1.2240818
7 00000050 0.3598138
8 00000083 0.4007715
9 00000051 0.1106827
10 00000042 -0.5558411
Note: We used the input file generated using this random seed:
set.seed(123)
a <- formatC(sample(1:99, 10), width = 8, format = "d", flag = "0")
fakeDF <- data.frame(v1=a, v2=rnorm(10, 0, 1))
f1 <- tempfile()
write.table(fakeDF, file=f1, quote=FALSE, row.names=FALSE, col.names=FALSE, sep="|")
Upvotes: 2