Reputation: 43
I'm passing a vector of web pages, read from an XLSX file, to a SQL statement to be used in BigQuery but am coming up with an error when using the IN operator.
library(xlsx)
library(dplyr)
library(bigrquery)
test_control_group <- xlsx::read.xlsx('./test_control_group.xlsx',
sheetName = 'Test_Control')
pages <- test_control_group %>%
dplyr::select(page) %>%
distinct()
projectid <- "googleproject"
sql <- paste0(
"SELECT * FROM `googleproject.dataset.table`
WHERE page IN (", pages, ")"
)
tb <- bigrquery::bq_project_query(projectid, sql)
I get the error:
Error: Job 'googleproject.job_bheGLdlR4_WqQ4iMu2G8UcFNscEc.europe-west2' failed
x Function not found: c at [5:24] [invalidQuery]
Since it is recognising R's c() notation.
"\n SELECT * FROM `googleproject.dataset.table`\n WHERE page IN (c(\"https:...))"
How do I get around this problem please?
Upvotes: 0
Views: 496
Reputation: 43
Adding the following lines to the code worked.
pages <- dput(as.character(pages))
pages <- str_replace_all(pages, "(c\\(|\\))","")
So the script now looks like:
library(xlsx)
library(dplyr)
library(bigrquery)
test_control_group <- xlsx::read.xlsx('./test_control_group.xlsx',
sheetName = 'Test_Control')
pages <- test_control_group %>%
dplyr::select(page) %>%
distinct()
pages <- dput(as.character(pages))
pages <- str_replace_all(pages, "(c\\(|\\))","")
projectid <- "googleproject"
sql <- paste0(
"SELECT * FROM `googleproject.dataset.table`
WHERE page IN (", pages, ")"
)
tb <- bigrquery::bq_project_query(projectid, sql)
Upvotes: 1
Reputation: 1780
here I suggest you check these options.
The variable pages save in a variable to get the results in this variable and this new variable concatenate in the SQL query . You can see more information about c Function
You can see this example.
#Save the c function to a variable
x2 <- c(\"https:...”)
#use in the query statement
sql <- paste0( "SELECT * FROM `googleproject.dataset.table` WHERE page IN (", x2, ")" )
You can use the unnest statement with BigQuery and isnet the data in a new table and manipulate all the information in BigQuery. You can see more information about the UNNEST statement.
You can see this example.
SELECT *
FROM UNNEST(['foo', 'bar', 'baz', 'qux', 'corge', 'garply', 'waldo', 'fred'])
AS element
WITH OFFSET AS offset
ORDER BY offset;
+----------+--------+
| element | offset |
+----------+--------+
| foo | 0 |
| bar | 1 |
| baz | 2 |
| qux | 3 |
| corge | 4 |
| garply | 5 |
| waldo | 6 |
| fred | 7 |
+----------+--------+
Maybe you could insert the data from the excel to BigQuery in another table and manipulate all the information from the database. You can see more information about export to BigQuery.
Upvotes: 0