GabyLP
GabyLP

Reputation: 3781

R call variable inside sqldf

I need to do a loop over sqldf statements and for that I need to call the loop variable inside the sqldf code:

My table "data", might be:

data <- read.table(text ="
    loaddate DaysRange DaysRangeNext
1 2014-03-16        30            30
2 2014-03-16         0             0
3 2014-03-16         0             0
4 2014-03-16        60            NA
5 2014-04-16        30            30
6 2014-04-16         0            30
"
,header = TRUE)

then I format loaddate as a date:

data$loaddate<-as.Date(as.character(data$loaddate), format='%Y-%m-%d')

Let's say I have a vector "loaddates":

loaddates<- unique(sort(data$loaddate))

And I need to run the following code for each loaddate:

for (i in loaddates) {

sqldf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext, 
COUNT(*) AS clientes
FROM data AS D
WHERE D.loaddate = i
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
")        }

But I get the followng error:

Error in sqliteSendQuery(con, statement, bind.data) : error in statement: no such column: i

Is there a way to keep the variable value and use it inside the loop?

Thanks.

EDITION:

I tried:

sqldf(
strwrap(sprintf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext, 
COUNT(*) AS clientes 
FROM data AS D
WHERE D.LoadDate = '%s'
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
",i),simplify=TRUE,width=1000000))

But I got:

> [1] loaddate      DaysRange     DaysRangeNext clientes      <0 rows>
> (or 0-length row.names)

Upvotes: 4

Views: 4158

Answers (4)

G. Grothendieck
G. Grothendieck

Reputation: 269491

fn$sqldf allows the use of $ within the sql statement to interpolate R variables. See Example 5 on the sqldf github home page and see the bottom of the help page ?fn for many more examples. We could reduce setNames(loaddates, loaddates) to just loaddates if we don't need names on the output.

Map(function(i)
  fn$sqldf("
    SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext, 
    COUNT(*) AS clientes
    FROM data AS D
    WHERE D.loaddate = $i
    GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
    ORDER BY D.DaysRange, D.DaysRangeNext
  "), setNames(loaddates, loaddates))

giving:

$`2014-03-16`
    loaddate DaysRange DaysRangeNext clientes
1 2014-03-16         0             0        2
2 2014-03-16        30            30        1
3 2014-03-16        60            NA        1

$`2014-04-16`
    loaddate DaysRange DaysRangeNext clientes
1 2014-04-16         0            30        1
2 2014-04-16        30            30        1

Upvotes: 0

maRRRRk
maRRRRk

Reputation: 1

First create a new data frame. Then join to it:

num_Pcode <- as.numeric("3550")
df_Pcode_0 <- as.data.frame(num_Pcode)
df_Pcode_0
...

returns num_Pcode.

Upvotes: 0

anakin
anakin

Reputation: 337

You might get this working by defining your SQL statement within the loop but outside the function call.

for (i in loaddates) {

statement = paste( " SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
              COUNT(*) AS clientes
              FROM data AS D
              WHERE D.loaddate = ", i,
"GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext " )

sqldf(statement)
}

Upvotes: 1

LyzandeR
LyzandeR

Reputation: 37879

Variable i will not be replaced in the query as it is. You need sprintf to assign a value to it. (I also don't know if you need to account for break lines but just to be sure I provide it below. Maybe you don't need for sqldf; in which case just remove strwrap).

#let's assume loaddates is the following:
loaddates <- 'something'

One way to get the query as you want, i.e. with no break lines and with i taking the values of loaddates you need:

strwrap(sprintf("
                SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext, 
                COUNT(*) AS clientes
                FROM deuda AS D
                WHERE D.loaddate = '%s'
                GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
                ORDER BY D.DaysRange, D.DaysRangeNext
                ",i),simplify=TRUE,width=1000000)

which will output:

[1] "SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext, COUNT(*) AS clientes FROM deuda AS D WHERE D.CodEmp = 'TGG' and D.loaddate = something GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext ORDER BY D.DaysRange, D.DaysRangeNext"

Which is what you need in one line with no break lines or with the variable i unassigned.

In your loop it should be:

for (i in loaddates) {

strwrap(sprintf("
                SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext, 
                COUNT(*) AS clientes
                FROM deuda AS D
                WHERE D.loaddate = '%s'
                GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
                ORDER BY D.DaysRange, D.DaysRangeNext
                ",i),simplify=TRUE,width=1000000)

}

With your dataset:

library(sqldf)
data <- read.table(text ="
    loaddate DaysRange DaysRangeNext
1 2014-03-16        30            30
2 2014-03-16         0             0
3 2014-03-16         0             0
4 2014-03-16        60            NA
5 2014-04-16        30            30
6 2014-04-16         0            30
"
                   ,header = TRUE,stringsAsFactors=F)

loaddates<- unique(sort(data$loaddate))

for (i in loaddates) {

  print(sqldf(
  strwrap(sprintf("
                SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext, 
                COUNT(*) AS clientes
                FROM data AS D
                WHERE D.loaddate = '%s'
                GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
                ORDER BY D.DaysRange, D.DaysRangeNext
                ",i),simplify=TRUE,width=1000000) ))
}

Output:

    loaddate DaysRange DaysRangeNext clientes
1 2014-03-16         0             0        2
2 2014-03-16        30            30        1
3 2014-03-16        60            NA        1
    loaddate DaysRange DaysRangeNext clientes
1 2014-04-16         0            30        1
2 2014-04-16        30            30        1

Upvotes: 3

Related Questions