tmer22
tmer22

Reputation: 11

R function that takes in CL argument and queries SQL database

Brand new to SQL and SQLite here. I'm trying to create a function in R studio that takes in an argument from the command line and queries an SQL database to see if the record exists in one specific column, displaying a message to the user whether the record was found or not (I have a table within this database, lets call it my_table, that has 2 columns, we'll name them column_1 and column_2. column_1 has ID numbers and column_2 has names that are associated with those ID numbers, and there are a total of about 700 rows).

So far I have something that looks like this:

my_function() <- function(CL_record) { db <- dbConnect(SQLite(), dbname = "mysql.db") dbGetQuery(db, sql("SELECT * FROM my_table WHERE column_2 == @CL_record")) }

But this is obviously not the right way to go about it and I keep getting errors thrown regarding invalid (NULL) left side of assignment.

Any help here would be much appreciated.

Upvotes: 1

Views: 170

Answers (1)

r2evans
r2evans

Reputation: 160437

I recommend using parameterized queries, something like:

my_function <- function(CL_record) {
  db <- dbConnect(SQLite(), dbname = "mysql.db")
  on.exit(dbDisconnect(db), add = TRUE)
  dbGetQuery(db, "SELECT * FROM my_table WHERE column_2 = ?",
             params = list(CL_record))
}

The params= argument does not need to be a list(.), it works just fine here as params=CL_record, but if you have two or more, and especially if they are of different classes (strings vs numbers), then you really should list them.

You'll see many suggestions, or even howtos/tutorials that suggest using paste or similar for adding parameters to the query. There are at least two reasons for not using paste or sprintf:

  1. Whether malicious or inadvertent (e.g., typos), sql injection is something that should be actively avoided. Even if your code is not public-facing, accidents happen.

  2. Most (all?) DBMSes have query optimization/compiling. When it receives a query it has not seen before, it parses and attempts to optimize the query; when it sees a repeat query, it can re-use the previous optimization, saving time. When you paste or sprintf an argument into a query, each query is different from the previous, so it is overall less efficient. When using bound parameters, the query itself does not change (just its parameters), so we can take advantage of the compiling/optimization.

Upvotes: 1

Related Questions