jsta
jsta

Reputation: 3403

How to insert list of data.frame(s) to database?

I want to make an SQLite database from a list of data.frame(s) using the dplyr package. It looks like dplyr::copy_to is what I need. I think the problem is related to NSE. Also see the vignette on dplyr with databases.

data(iris)
data(cars)

res <- list("iris" = iris, "cars" = cars)

my_db <- dplyr::src_sqlite(paste0(tempdir(), "/foobar.sqlite3"),
                       create = TRUE)

lapply(res, function(x) dplyr::copy_to(my_db, x))

Error: Table x already exists.

Upvotes: 0

Views: 763

Answers (2)

Julian
Julian

Reputation: 9310

If someone prefers a {purrr} option:

imap(res, ~dplyr::copy_to(dest = my_db, 
                            df = .x,
                            name = .y,
                            overwrite = TRUE))

or

imap(res, ~dbWriteTable(conn = my_db, .y,
                                        value = .x,
                                       overwrite = TRUE))

Upvotes: 0

Michael Griffiths
Michael Griffiths

Reputation: 1427

The reason for this is because the default table name is based off of the name of the data frame in R. When using lapply, it does not take the index name.

The documentation for dplyr::copy_to.src_sql contains:

## S3 method for class 'src_sql'
copy_to(dest, df, name = deparse(substitute(df)),
  types = NULL, temporary = TRUE, unique_indexes = NULL, indexes = NULL,
  analyze = TRUE, ...)

The line name = deparse(substitute(df)) shows where the table name comes from.

We can see what that turns into:

res <- list("iris" = iris, "cars" = cars)
tmp = lapply(res, function(x) print(deparse(substitute(x))))
#> [1] "X[[i]]"
#> [1] "X[[i]]"

The name for the table in the SQLite source is X[[i]]; once the first line has been executed, the table already exists.

We can resolve by either using an explicit for loop and passing the name secondarily, or by using lapply on an index number.

For instance:

res <- list("iris" = iris, "cars" = cars)
my_db <- dplyr::src_sqlite(paste0(tempdir(), "/foobar.sqlite3"),
                           create = TRUE)
lapply(seq_along(res), function(i, l){dplyr::copy_to(my_db, l[[i]], names(l)[[i]])}, l = res)
my_db %>% tbl("iris") %>% head

#> Source:   query [?? x 5]
#> Database: sqlite 3.8.6 
#> 
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl>   <chr>
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa

my_db %>% tbl("cars") %>% head

#> Source:   query [?? x 2]
#> Database: sqlite 3.8.6 
#> 
#>   speed  dist
#>   <dbl> <dbl>
#> 1     4     2
#> 2     4    10
#> 3     7     4
#> 4     7    22
#> 5     8    16
#> 6     9    10

Upvotes: 1

Related Questions