user220419
user220419

Reputation: 569

How do I correctly close a connection in R, so its connection 'slot' gets released?

I am using readLines(text url) in a script, where readLines(text url) is called several hundred times, where each text url is unique.

After about 125 calls to readLines(text url) I got an error, "all connections are in use."

When I check my open connections with showConnections(all=TRUE), for the url connections I see:

 description     class ... isopen
 "www.site.com"  "url" ... "closed" ...

How do I remove these closed connections from the R environment so I can open new connections?

Also, I've tried opening the urls before hand, passing the url connection into readLines, then closing the connection after I'm done with the connection, and still run into the same problem.

Upvotes: 10

Views: 14252

Answers (2)

deeenes
deeenes

Reputation: 4576

For me Hadley's answer did not work because of two reasons: 1) close is an S3 generic and has no method for character objects (maybe it has changed since the answer?); 2) My connection was not in the table returned by showConnections. It was a curl type connection left open by reader::read_tsv after encountering an SSL expired certificate error. If we get the warnings about connections created by third party packages, we have to obtain the connection object to be able to close them. I wrote two little function for this purpose. You see these import more packages than definitely needed, this is because I designed them for a package, but you can easily remove these dependencies.

library(magrittr)
library(tibble)
library(dplyr)
library(purrr)

#' Retrieve the open connection(s) pointing to URI
#'
#' @param uri Character: path or URL the connection points to.
#'
#' @return A list of connection objects.
#'
#' @importFrom magrittr %>%
#' @importFrom tibble rownames_to_column
#' @importFrom dplyr filter pull
#' @importFrom purrr map
#' @noRd
get_connections <- function(uri){

    showConnections(all = TRUE) %>%
    as.data.frame %>%
    rownames_to_column('con_id') %>%
    filter(description == uri) %>%
    pull(con_id) %>%
    as.integer %>%
    map(getConnection)

}


#' Closes the open connection(s) pointing to URI
#'
#' @param uri Character: path or URL the connection points to.
#'
#' @return Invisible `NULL`.
#'
#' @importFrom magrittr %>%
#' @importFrom purrr walk
#' @noRd
close_connection <- function(uri){

    uri %>%
    get_connections %>%
    walk(close)

    invisible(NULL)

}

Having the functions above, you can do as Hadley has shown:

my_function <- function(my_url, ...) {
    on.exit(close_connection(my_url))
}

Upvotes: 2

hadley
hadley

Reputation: 103898

The easiest way to avoid problems like this is to explicitly close the connection when you're done with it. In R, the easiest way to do that is to use on.exit() which will ensure the url gets closed even if an error occurs in your code

read_url <- function(url, ...) {
  on.exit(close(url))
  readLines(url, ...)
}
showConnections()
g <- read_url("http://www.google.com")
showConnections()

Upvotes: 13

Related Questions