Tianjian Qin
Tianjian Qin

Reputation: 630

RCurl::getURL how to only list URLs of files inside a folder

I want to list the files in a remote SFTP server, so I did this:

    url <- "sftp://remoteserver.com/dir/"
    credentials <- "myusrname/mypwd"

    file_list <- tryCatch({
    
        RCurl::getURL(
          url,
          userpwd = credentials,
          ftp.use.epsv = FALSE,
          dirlistonly = TRUE,
          forbid.reuse = TRUE,
          .encoding = "UTF-8"
        )
    
      }, error = function(e) {
        as.character()
      })

However, in file_list, except for the URLs of the files in that folder, there are also some extra entries that I don't need:

# at the beginning of the vector
[1] "sftp://remoteserver.com/dir/."
[2] "sftp://remoteserver.com/dir/.."

# at the end of the vector
[67] "sftp://remoteserver.com/dir/"

Is there a way to avoid these entries? Is it safe to use the following code to just delete them?

file_list <- file_list[c(-1, -2)]
file_list <- file_list[-length(file_list)]

Upvotes: 1

Views: 641

Answers (1)

Ben G
Ben G

Reputation: 4338

I don't think that's the best method in case it's not always in that order. If you want everything that is a .logs file, then I'd do something like this:

library(dplyr)
library(stringr)

file_list <- c(
  "sftp://remoteserver.com/dir/.",
  "sftp://remoteserver.com/dir/.",
  "sftp://remoteserver.com/dir/names.logs",
  "sftp://remoteserver.com/dir/"
)

as_tibble(file_list) %>% # because it's just easier for me to think of things as dataframes 
  filter(str_detect(value, "logs$")) %>% 
  pull()


[1] "sftp://remoteserver.com/dir/names.logs"

Upvotes: 1

Related Questions