ANieder
ANieder

Reputation: 223

Timeout while reading csv file from url in R

I currently have a script in R that loops around 2000 times (for loop), and on each loop it queries data from a database using a url link and the read.csv function to put the data into a variable.

My problem is: when I query low amounts of data (around 10000 rows) it takes around 12 seconds per loop and its fine. But now I need to query around 50000 rows of data per loop and the query time increases quite a lot, to 50 seconds or so per loop. And this is fine for me but sometimes I notice it takes longer for the server to send the data (≈75-90 seconds) and APPARENTLY the connection times out and I get these errors:

Error in file(file, "rt") : cannot open the connection

In addition: Warning message:

In file(file, "rt") : cannot open: HTTP status was '0 (nil)'

or this one:

Error in file(file, "rt") : cannot open the connection

In addition: Warning message:

In file(file, "rt") : InternetOpenUrl failed: 'The operation timed out'

I don't get the same warning every time, it changes between those two.

Now, what I want is to avoid my program to stop when this happens, or to simply prevent this timeout error and tell R to wait more time for the data. I have tried these settings at the start of my script as a possible solution but it keeps happening.

options(timeout=190)
setInternet2(use=NA)
setInternet2(use=FALSE)
setInternet2(use=NA)

Any other suggestions or workarounds? Maybe to skip to the next loop when this happens and store in a variable the loop number of the times this error occurred so it can be queried again in the end but only for those i's in the loop that were skipped due to the connection error? The ideal solution would be, of course, to avoid having this error.

Upvotes: 5

Views: 9548

Answers (3)

acarter
acarter

Reputation: 21

I see this is an older post, but it still comes up early in the list of Google results, so...

If you are downloading via WinInet (rather than curl, internal, wget, etc.) options, including timeout, are inherited from the system. Thus, you cannot set the timeout in R. You must change the Internet Explorer settings. See Microsoft references for details: https://support.microsoft.com/en-us/kb/181050 https://support.microsoft.com/en-us/kb/193625

Upvotes: 2

Richie Cotton
Richie Cotton

Reputation: 121077

A solution using the RCurl package:

You can change the timeout option using

curlSetOpt(timeout = 200)

or by passing it into the call to getURL

getURL(url_vect[i], timeout = 200)

A solution using base R:

Simply download each file using download.file, and then worry about manipulating those file later.

Upvotes: 4

alap
alap

Reputation: 647

This is partial code that I show you, but you can modify to you're needs :

        # connect to website
        withRestarts(
            tryCatch(
                webpage <- getURL(url_vect[i]),
                finally = print(" Succes.")
            ), 
            abort = function(){},
            error = function(e) {
                       i<-i+1
            }
        )

In my case the url_vect[i] was one of the url's I copied information. This will increase the time you need to wait for the program to finish sadly.

UPDATED

tryCatch how to example

Upvotes: 0

Related Questions