Allan
Allan

Reputation: 35

RODBC::sqlsave() truncates col in DF when saving to SQL Server to varchar(255)

RODBC:sqlsave is default writing a data.frame with only char columns as varchar(255) and truncating the rest. The var in the DF are actually almost 4000 char in length.

I have tried :

sqlQuery(db,'CREATE TABLE SCUBA_tweetsC8 (user_id``` varchar(max), 
status_id varchar(max), 
screen_name varchar(max), 
text varchar(max),
source varchar(max),
reply_to_status_id varchar(max), 
reply_to_user_id varchar(max), 
reply_to_screen_name varchar(max), 
ext_media_type varchar(max), 
lang varchar(max),
quoted_status_id varchar(max), 
quoted_text varchar(max), 
quoted_source varchar(max), 
quoted_user_id varchar(max), 
quoted_screen_name varchar(max), 
quoted_name varchar(max), 
quoted_location varchar(max), 
quoted_description varchar(max), 
retweet_status_id varchar(max),
retweet_text varchar(max), 
retweet_source varchar(max), 
retweet_user_id varchar(max),
retweet_screen_name varchar(max), 
retweet_name varchar(max), 
retweet_location varchar(max), 
retweet_description varchar(max), 
place_url varchar(max), 
place_name varchar(max),
place_full_name varchar(max), 
place_type varchar(max), 
country varchar(max), 
country_code varchar(max), 
status_url varchar(max), 
name varchar(max),
location varchar(max), 
description varchar(max), 
url varchar(max), 
profile_url varchar(max), 
profile_expanded_url varchar(max), 
profile_banner_url varchar(max), 
profile_background_url varchar(max), 
profile_image_url varchar(max),);')

sqlSave(db,SCUBA_tweetsC,"SCUBA_tweetsC8",append = T)

but I get the following error:

Error in odbcUpdate(channel, query, mydata, coldata[m, ], test = test, : 'Calloc' could not allocate memory (18446744071562067968 of 1 bytes)

I have tried the best I could the other suggestions such as trying to use the VarType function with no success.

Upvotes: 3

Views: 383

Answers (1)

r2evans
r2evans

Reputation: 160447

I don't use RODBC, so I cannot test/reproduce your problem, but I'll try to reproduce your situation and show that in my environment it does not fail.

Sample data:

library(tibble)
dat <- tibble(id = 1:2, chr = c(strrep("A", 4000), strrep("B", 400000)))
nchar(dat$chr)
# [1]   4000 400000

library(DBI)
# library(odbc) # no need to load, but need it installed/available
con <- DBI::dbConnect(odbc::odbc(), driver = "ODBC Driver 17 for SQL Server",
                      database = "mydb", server = "111.222.333.444,1433",
                      uid = "myuser", pwd = "mypassword")

(I'm not going to go into all of the options required for this.)

Manually-defined table

DBI::dbExecute(con, "drop table if exists r2test")
# [1] 0
DBI::dbExecute(con, "create table r2test (id int, chr nvarchar(max))")
# [1] 0
system.time(
  DBI::dbWriteTable(con, "r2test", dat, append = TRUE)
)
#    user  system elapsed 
#    0.00    0.02    1.28 
dat2 <- DBI::dbGetQuery(con, "select id, chr from r2test")
nchar(dat2$chr)
# [1]   4000 400000
str(dat2)
# 'data.frame': 2 obs. of  2 variables:
#  $ id : int  1 2
#  $ chr: chr  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"| __truncated__ "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"| __truncated__

Justifying pre-CreateTable

I find two possible error scenarios with SQL Server and DBI.

DBI::dbExecute(con, "drop table if exists r2test")
### also with DBI::dbCreateTable(con2, "r2test", dat)
DBI::dbWriteTable(con, "r2test", dat, create = TRUE)
# Error: nanodbc/nanodbc.cpp:1617: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near '4e+05'.  [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Statement(s) could not be prepared. 
# <SQL> 'CREATE TABLE "r2test" (
#   "id" INT,
#   "chr" varchar(4e+05)
# )
# '

This is because SQL Server appears to not like scientific notation for field sizes. We can avoid this by changing scipen:

options(scipen=99)
DBI::dbWriteTable(con, "r2test", dat, create = TRUE)
# Error: nanodbc/nanodbc.cpp:1617: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The size (400000) given to the column 'chr' exceeds the maximum allowed for any data type (8000).  [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Statement(s) could not be prepared. 
# <SQL> 'CREATE TABLE "r2test" (
#   "id" INT,
#   "chr" varchar(400000)
# )
# '

And now we see that SQL Server doesn't like explicit sizes that large, so we need to encourage it to use varchar(max).

Pre-created table

DBI::dbExecute(con, "drop table if exists r2test")
DBI::dbCreateTable(con2, "r2test", fields = c(id="INT", chr="nvarchar(max)"))
system.time(
  DBI::dbWriteTable(con, "r2test", dat, append = TRUE)
)
#    user  system elapsed 
#    0.00    0.01    1.34 
dat3 <- DBI::dbGetQuery(con, "select id, chr from r2test")
nchar(dat3$chr)
# [1]   4000 400000
str(dat3)
# 'data.frame': 2 obs. of  2 variables:
#  $ id : int  1 2
#  $ chr: chr  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"| __truncated__ "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"| __truncated__

Note about "large fields"

When using the Microsoft ODBC driver for SQL Server, one must always select "large" fields last in the query. For instance,

DBI::dbGetQuery(con, "select chr, id from r2test")
# Error in result_fetch(res@ptr, n) : 
#   nanodbc/nanodbc.cpp:2966: 07009: [Microsoft][ODBC Driver 17 for SQL Server]Invalid Descriptor Index 

This is a known thing with MS's odbc driver for SQL Server (driver "ODBC Driver 17 for SQL Server"). The problem has been around for years. No other driver (including FreeTDS, which connects to SQL Server) is prone to this "feature". There is no indication that this will ever change (in fact, it's "formal" in the documentation, though "large" is not quantified).

I have no idea if RODBC has this issue as well; since it does not use nanodbc, it might work with the SQLGetData function a little more intelligently, side-stepping the problem.

Ways to work around this problem:

  • always put "large data" at the end of your list of selected columns;
  • use FreeTDS instead of Microsoft's ODBC drivers ... allegedly it is slightly slower (10%? idk), but I've successfully installed in windows/linux and selected fields in crazy orders without problem;
  • use RStudio's professional drivers, if you have the right OS and one of RStudio's professional products;
  • do all queries with "large data" using their bulk tools (bcp or sqlcmd), I believe they both deal better with it, though this is much less interactive than on the R console;
  • use RODBC (allegedly ... again, I don't know);
  • don't use "large data" fields (...(max) or anything larger than ...(255) ... a not-well-defined number) ... perhaps not an option; -- RECENTLY, a PR (odbc!415) has finalized the ability to workaround this large-field problem, so a github-install of the package (until released on CRAN) will work; or
  • use a different DBMS than SQL Server ... perhaps not an option.

References:

Upvotes: 3

Related Questions