Thomas
Thomas

Reputation: 111

How to handle binary strings in R?

R is not able to cope with null-strings (\0) in characters, does anyone know how to handle this? More concrete, I want to store complex R objects within a database using an ODBC or JDBC connection. Since complex R objects are not easily to be mapped to dataframes, I need a different possibility to store such objects. An object could be for example:

library(kernlab)
data(iris)
model <- ksvm(Species ~ ., data=iris, type="C-bsvc", kernel="rbfdot", kpar="automatic", C=10) 

Because >model< cannot be stored directly in a database, I use the serialize() function to retrieve a binary representation of the object (in order to store it in a BLOB column):

 serialModel <- serialize(model, NULL)

Now I would like to store this via ODBC/JDBC. To do so, I need a string representation of the object in order to send a query to the database, e.g. INSERT INTO. Since the result is a vector of type raw vector, I need to convert it:

 stringModel <- rawToChar(serialModel)

And there is the problem:

Error in rawToChar(serialModel) : 
  embedded nul in string: 'X\n\0\0\0\002\0\002\v\0......

R is not able to deal with \0 in strings. Does anyone has an idea how to bypass this restriction? Or is there probably a completly different approach to achieve this goal?

Thanks in advance

Upvotes: 11

Views: 4709

Answers (2)

Joris Meys
Joris Meys

Reputation: 108593

You need

stringModel <- as.character(serialModel)

for a character representation of the raw bit codes. rawToChar will try to convert the raw bit codes, which is not what you want in this case.

The resulting stringModel can be converted later on back to the original model by :

newSerialModel <- as.raw(as.hexmode(stringModel))
newModel <- unserialize(newSerialModel)
all.equal(model,newModel)
[1] TRUE

Regarding the writing of binary types to databases through RODBC : as for today, the vignette of RODBC reads (p.11) :

Binary types can currently only be read as such, and they are returned as column of class "ODBC binary" which is a list of raw vectors.

Upvotes: 11

IRTFM
IRTFM

Reputation: 263471

A completely different approach would be to simply store the output of capture.output(dput(model)) along with a descriptive name and then reconstitute it with <- or assign(). See comments below regarding the need for capture.output().

> dput(Mat1)
structure(list(Weight = c(7.6, 8.4, 8.6, 8.6, 1.4), Date = c("04/28/11", 
"04/29/11", "04/29/11", "04/29/11", "05/01/11"), Time = c("09:30 ", 
"03:11", "05:32", "09:53", "19:52")), .Names = c("Weight", "Date", 
"Time"), row.names = c(NA, -5L), class = "data.frame")
> y <- capture.output(dput(Mat1))
> y <- paste(y, collapse="", sep="")  # Needed because capture output breaks into multiple lines
> dget(textConnection(y))
  Weight     Date   Time
1    7.6 04/28/11 09:30 
2    8.4 04/29/11  03:11
3    8.6 04/29/11  05:32
4    8.6 04/29/11  09:53
5    1.4 05/01/11  19:52
> new.Mat <- dget(textConnection(y))

Upvotes: 4

Related Questions