Romm
Romm

Reputation: 21

R to SQL server. Make a query joining a database table with a R dataframe

I have an ODBC connection to SQL server database. From R, I want to query a table with lots of data, but I want to get only those records that match my dataframe in R by certain columns (INNER JOIN). I do currently linking ODBC tables in MS ACCESS 2003 (linked tables "dbo_name") and then doing relational queries, without downloading the entire table. I need to reproduce this process in R avoiding downloading the entire table (avoid SQLFetch ()).

I have read the information from ODBC, DBI, rsqlserver packages without success. Is there any package or way to fix this?

Upvotes: 1

Views: 5709

Answers (3)

superboreen
superboreen

Reputation: 57

I adapted the answer above and the following worked for me without needing SQL syntax. The table I used was from the adventureworks SQL Server database.

lazy_dim_customer <- dplyr::tbl(conn, dbplyr::in_schema("dbo", "DimCustomer")) 

# Create data frame of customer ids

adv_customers  <- dplyr::tbl(conn, "DimCustomer")

query1 <- adv_customers %>% 
  filter(CustomerKey < 20000) %>%
  select(CustomerKey)

d00df_customer_keys <- query1 %>% dplyr::collect() 

# Chunk customer ids, filter, collect and bind

id.chunks <- split(d00df_customer_keys$CustomerKey, seq(10))

result.list <- lapply(id.chunks, function(ids)
  lazy_dim_customer %>% 
    filter(CustomerKey %in% ids) %>% 
    select(CustomerKey, FirstName, LastName) %>%
    collect() )

combined.results <- do.call(rbind, result.list)

Upvotes: 0

nograpes
nograpes

Reputation: 18323

If you can't write a table to the database, there is another trick you can use. You essentially make a giant WHERE statement. Let's say you want to join table table in the database to your data.frame called a on the column id. You could say:

ids <- paste0(a$id,collapse=',') 
# If a$id is a character, you'll have to surround this in quotes:
# ids <- paste0(paste0("'",a$id,"'"),collapse=',')
dbGetQuery(con, paste0('SELECT * FROM table where id in (',paste(ids,collapse=','),')'))

From your comment, it seems that SQL Server has a problem with a query of that size. I suspect that you may have to "chunk" the query into smaller bits, and then join them all together. Here is an example of splitting the ids into 1000 chunks, querying, and then combining.

id.chunks <- split(a$ids,seq(1000))
result.list <- lapply(id.chunks, function(ids) 
                 dbGetQuery(con, 
                            paste0('SELECT * FROM table where id in (',ids,')')))
combined.resuls <- do.call(rbind,result.list)

Upvotes: 2

Romm
Romm

Reputation: 21

The problem was solved. Ids vector was divided into groups of 1000, and then querying to the server each. I show the unorthodox code. Thanks nograpes!

# "lani1" is the vector with 395.474 ids 
id.chunks<-split(lani1,seq(1000))
        for (i in 1:length(id.chunks)){
           idsi<-paste0(paste0("'",as.vector(unlist(id.chunks[i])),"'"),collapse=',')
             if(i==1){ani<-sqlQuery(riia,paste0('SELECT * FROM T_ANIMALES WHERE an_id IN (',idsi,')'))
                     }
                else {ani1<-sqlQuery(riia,paste0('SELECT * FROM T_ANIMALES WHERE an_id IN (',idsi,')'))
                     ani<-rbind(ani,ani1)
                     }                
                                      }

Upvotes: 0

Related Questions