user2720047
user2720047

Reputation: 53

RODBC does not save greater than 8k varchar/text from a data frame

I am scraping web data and some results are more than 8K of text. I have a field in my MS SQL Server 2008 R2 that is varchar(max) but RODBC only saves 8K and truncates the rest. I have DSN connections set up using both SQL Native Client 10 from SS R2 and ODBC 11.

sqlType returns -1 for the field title, indicating varchar(max)/text.

Both connections truncate the text at 8K. Is there a fix/workaround for this or am I limited to 8K in SQL Server? I can switch to MySQL if the R mySQL driver will pass the larger amounts of text through. I'm using R 3.0.1 and RODBC 1.3-6 on Win 7 x64.

Here is some sample code to replicate the problem.

CREATE TABLE
[test](
[title] [varchar](max) NULL
) ON [PRIMARY]


library(RODBC)
ch <- odbcConnect("text", uid = "X", pwd = "X")
sqlTypeInfo(ch,"test")

testtext = data.frame("salasjflsjsf")
colnames(testtext) = "title"

## This works
sqlSave(ch,testtext,"test",append=TRUE, rownames=FALSE, colnames = FALSE, safer=FALSE,  verbose = TRUE)


## text > 8K
testtext = data.frame(paste(rep("salasjflsjsf",5000),collapse=""))
colnames(testtext) = "title"

## use ODBC default type for title
sqlSave(ch,testtext,"test",append=TRUE, rownames=FALSE, colnames = FALSE, safer=FALSE,  verbose = TRUE)

## this message from ODBC
## Query: INSERT INTO "test" ( "title" ) VALUES ( ? )
## Binding: 'title' DataType 12, ColSize 8000
## Parameters:
## no: 1: title salasjfl
## [*text removed*]
## lsjsfsalasjflsjsfsalasjflsjsfsalasjflsjsfsalasjflsjsfsalasjflsjsfsalasjflsjsf/***/
## Warning message:
## In odbcUpdate(channel, query, mydata, coldata[m, ], test = test,  :
##   character data 'salasjflsjsfsalasjflsjsfsalasjflsjsfsalasjfl




## try with varType set to text
sqlSave(ch,testtext,"test",append=TRUE, varType = "text",rownames=FALSE, colnames = FALSE, safer=FALSE,  verbose = TRUE)

## try with varType set to varchar(MAX)
sqlSave(ch,testtext,"test",append=TRUE, varType = "varchar(max)",rownames=FALSE, colnames = FALSE, safer=FALSE,  verbose = TRUE)

close(ch)

I changed the column type to text and refreshed the DSN connection. Text type is being picked up but now there is an error message:

 chtest <-odbcConnect("test2", uid = "X", pwd = "X")
    > sqlColumns(chtest,"test")
      TABLE_CAT TABLE_SCHEM TABLE_NAME COLUMN_NAME DATA_TYPE TYPE_NAME COLUMN_SIZE BUFFER_LENGTH DECIMAL_DIGITS NUM_PREC_RADIX NULLABLE REMARKS
    1   RSS            dbo       test       title        -1      text  2147483647    2147483647             NA             NA        1    <NA>
      COLUMN_DEF SQL_DATA_TYPE SQL_DATETIME_SUB CHAR_OCTET_LENGTH ORDINAL_POSITION IS_NULLABLE SS_IS_SPARSE SS_IS_COLUMN_SET SS_IS_COMPUTED
    1       <NA>            -1               NA        2147483647                1         YES            0                0              0
      SS_IS_IDENTITY SS_UDT_CATALOG_NAME SS_UDT_SCHEMA_NAME SS_UDT_ASSEMBLY_TYPE_NAME SS_XML_SCHEMACOLLECTION_CATALOG_NAME
    1              0                <NA>               <NA>                      <NA>                                 <NA>
      SS_XML_SCHEMACOLLECTION_SCHEMA_NAME SS_XML_SCHEMACOLLECTION_NAME SS_DATA_TYPE
    1                                <NA>                         <NA>           35
    > sqlSave(chtest,testtext,"test",append=TRUE, varType = "text",rownames=FALSE, colnames = FALSE, safer=FALSE,  verbose = TRUE)
    Query: INSERT INTO "test" ( "title" ) VALUES ( ? )
    Binding: 'title' DataType -1, ColSize 2147483647
    Error in odbcUpdate(channel, query, mydata, coldata[m, ], test = test,  : 
      'Calloc' could not allocate memory (18446744071562067968 of 1 bytes)
    In addition: Warning messages:
    1: In odbcUpdate(channel, query, mydata, coldata[m, ], test = test,  :
      Reached total allocation of 6013Mb: see help(memory.size)
    2: In odbcUpdate(channel, query, mydata, coldata[m, ], test = test,  :
      Reached total allocation of 6013Mb: see help(memory.size)

Upvotes: 5

Views: 2037

Answers (2)

Esben Eickhardt
Esben Eickhardt

Reputation: 3852

I had a similar problem and ended up doing the following:

# Finding the longest character
varTypes = c(col_in_question=sprintf("varchar(%s)", max(nchar(data$col_in_question))))

# Uploading data
sqlSave(channel = dbhandle, 
        dat = data, 
        tablename = "data_name",
        rownames = FALSE,
        varTypes = varTypes)

In this way you set the varchar to the max in your dataset, and if you dont have any extremely long characters (2 Gb) it should work. If you insist on varchar(max), then you can change it in your dataset afterworth using sqlQuery.

Upvotes: 0

Aert
Aert

Reputation: 2049

You defined the text column to be of type varchar(max), so it will be stored that way, regardless of specifying varType when saving the data.

Try creating a table with a TEXT column instead, then using sqlSave. This way the data will be stored out of row context and you'll shouldn't have the truncation issue.

Upvotes: 1

Related Questions