Reputation: 419
Is it possible to achieve the result of the following code in a more efficient and concise way? Due to the nature of my task, I cannot use base or tidyr functions, because the dplyr code needs to be translated to SQL and executed on a database.
library(dplyr)
library(dbplyr)
library(RSQLite)
library(DBI)
# Create example data set
id <- c("a", "b", "c")
df <- data.frame(id)
# Treat it as a data base table
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, df, "data", temporary = FALSE)
# Expand to data set and create a variable for four quarters
n <- 4
data <- tbl(con, "data") %>%
mutate(quarter = 1)
for (i in 2:n) {
data <- data %>%
mutate(quarter = i) %>%
union(data, data) %>%
show_query()
}
data <- collect(data)
My goal in my real life example is to query a list of IDs and to expand it to a data set with a variable "quarter". I want to use that list as a basis to successively join more information later.
Upvotes: 1
Views: 230
Reputation: 6931
It sounds like you want the Cartesian product of id = c('a', 'b', 'c')
and quarters = c(1, 2, 3, 4)
, which would give you id_quarter = c(('a',1), ('a',2), ('a',3), ..., ('c',4))
.
This can be done using a join on a dummy variable like so:
id <- c("a", "b", "c")
df <- data.frame(id)
quarter <- c(1, 2, 3, 4)
df_q <- data.frame(quarter)
# Treat it as a data base table
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, df, "data", temporary = FALSE)
copy_to(con, df_q, "quarter", temporary = FALSE)
# create placeholder column
data <- tbl(con, "data") %>%
mutate(dummy_placeholder = 1)
quarters <- tbl(con, "quarter") %>%
mutate(dummy_placeholder = 1)
# join and collect
result <- data %>%
inner_join(quarter, by = "dummy_placeholder") %>%
collect()
Upvotes: 1