Reputation: 35
RODBC:sqlsave is default writing a data.frame with only char columns as varchar(255)
and truncating the rest. The var in the DF are actually almost 4000 char in length.
I have tried :
sqlQuery(db,'CREATE TABLE SCUBA_tweetsC8 (user_id``` varchar(max),
status_id varchar(max),
screen_name varchar(max),
text varchar(max),
source varchar(max),
reply_to_status_id varchar(max),
reply_to_user_id varchar(max),
reply_to_screen_name varchar(max),
ext_media_type varchar(max),
lang varchar(max),
quoted_status_id varchar(max),
quoted_text varchar(max),
quoted_source varchar(max),
quoted_user_id varchar(max),
quoted_screen_name varchar(max),
quoted_name varchar(max),
quoted_location varchar(max),
quoted_description varchar(max),
retweet_status_id varchar(max),
retweet_text varchar(max),
retweet_source varchar(max),
retweet_user_id varchar(max),
retweet_screen_name varchar(max),
retweet_name varchar(max),
retweet_location varchar(max),
retweet_description varchar(max),
place_url varchar(max),
place_name varchar(max),
place_full_name varchar(max),
place_type varchar(max),
country varchar(max),
country_code varchar(max),
status_url varchar(max),
name varchar(max),
location varchar(max),
description varchar(max),
url varchar(max),
profile_url varchar(max),
profile_expanded_url varchar(max),
profile_banner_url varchar(max),
profile_background_url varchar(max),
profile_image_url varchar(max),);')
sqlSave(db,SCUBA_tweetsC,"SCUBA_tweetsC8",append = T)
but I get the following error:
Error in odbcUpdate(channel, query, mydata, coldata[m, ], test = test, : 'Calloc' could not allocate memory (18446744071562067968 of 1 bytes)
I have tried the best I could the other suggestions such as trying to use the VarType function with no success.
Upvotes: 3
Views: 383
Reputation: 160447
I don't use RODBC
, so I cannot test/reproduce your problem, but I'll try to reproduce your situation and show that in my environment it does not fail.
Sample data:
library(tibble)
dat <- tibble(id = 1:2, chr = c(strrep("A", 4000), strrep("B", 400000)))
nchar(dat$chr)
# [1] 4000 400000
library(DBI)
# library(odbc) # no need to load, but need it installed/available
con <- DBI::dbConnect(odbc::odbc(), driver = "ODBC Driver 17 for SQL Server",
database = "mydb", server = "111.222.333.444,1433",
uid = "myuser", pwd = "mypassword")
(I'm not going to go into all of the options required for this.)
DBI::dbExecute(con, "drop table if exists r2test")
# [1] 0
DBI::dbExecute(con, "create table r2test (id int, chr nvarchar(max))")
# [1] 0
system.time(
DBI::dbWriteTable(con, "r2test", dat, append = TRUE)
)
# user system elapsed
# 0.00 0.02 1.28
dat2 <- DBI::dbGetQuery(con, "select id, chr from r2test")
nchar(dat2$chr)
# [1] 4000 400000
str(dat2)
# 'data.frame': 2 obs. of 2 variables:
# $ id : int 1 2
# $ chr: chr "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"| __truncated__ "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"| __truncated__
CreateTable
I find two possible error scenarios with SQL Server and DBI
.
DBI::dbExecute(con, "drop table if exists r2test")
### also with DBI::dbCreateTable(con2, "r2test", dat)
DBI::dbWriteTable(con, "r2test", dat, create = TRUE)
# Error: nanodbc/nanodbc.cpp:1617: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near '4e+05'. [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Statement(s) could not be prepared.
# <SQL> 'CREATE TABLE "r2test" (
# "id" INT,
# "chr" varchar(4e+05)
# )
# '
This is because SQL Server appears to not like scientific notation for field sizes. We can avoid this by changing scipen
:
options(scipen=99)
DBI::dbWriteTable(con, "r2test", dat, create = TRUE)
# Error: nanodbc/nanodbc.cpp:1617: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The size (400000) given to the column 'chr' exceeds the maximum allowed for any data type (8000). [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Statement(s) could not be prepared.
# <SQL> 'CREATE TABLE "r2test" (
# "id" INT,
# "chr" varchar(400000)
# )
# '
And now we see that SQL Server doesn't like explicit sizes that large, so we need to encourage it to use varchar(max)
.
DBI::dbExecute(con, "drop table if exists r2test")
DBI::dbCreateTable(con2, "r2test", fields = c(id="INT", chr="nvarchar(max)"))
system.time(
DBI::dbWriteTable(con, "r2test", dat, append = TRUE)
)
# user system elapsed
# 0.00 0.01 1.34
dat3 <- DBI::dbGetQuery(con, "select id, chr from r2test")
nchar(dat3$chr)
# [1] 4000 400000
str(dat3)
# 'data.frame': 2 obs. of 2 variables:
# $ id : int 1 2
# $ chr: chr "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"| __truncated__ "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"| __truncated__
When using the Microsoft ODBC driver for SQL Server, one must always select "large" fields last in the query. For instance,
DBI::dbGetQuery(con, "select chr, id from r2test")
# Error in result_fetch(res@ptr, n) :
# nanodbc/nanodbc.cpp:2966: 07009: [Microsoft][ODBC Driver 17 for SQL Server]Invalid Descriptor Index
This is a known thing with MS's odbc driver for SQL Server (driver "ODBC Driver 17 for SQL Server"). The problem has been around for years. No other driver (including FreeTDS, which connects to SQL Server) is prone to this "feature". There is no indication that this will ever change (in fact, it's "formal" in the documentation, though "large" is not quantified).
I have no idea if RODBC
has this issue as well; since it does not use nanodbc
, it might work with the SQLGetData
function a little more intelligently, side-stepping the problem.
Ways to work around this problem:
bcp
or sqlcmd
), I believe they both deal better with it, though this is much less interactive than on the R console;RODBC
(allegedly ... again, I don't know);...(max)
or anything larger than ...(255)
... a not-well-defined number) ... perhaps not an option;
-- RECENTLY, a PR (odbc!415) has finalized the ability to workaround this large-field problem, so a github-install of the package (until released on CRAN) will work; orReferences:
In the odbc
package, issue odbc/#10 appears to be the first appearance, and is referenced in all others in this package; related closed issues (#82,
#86,
#112,
#171,
#256,
#331); and some recent discussion perhaps to work-around this problem (#309,
#358,
#373
)
Ultimately, odbc
uses the nanodbc
C++ library, and while they recognize the problem, they feel it is not theirs to fix (nanodbc/#149).
The source document at Microsoft: https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/getting-long-data
Upvotes: 3