pyronaur
pyronaur

Reputation: 3545

Ruby FTP Separating files from Folders

I'm trying to crawl FTP and pull down all the files recursively.

Up until now I was trying to pull down a directory with

   ftp.list.each do |entry|
    if entry.split(/\s+/)[0][0, 1] == "d"
      out[:dirs] << entry.split.last unless black_dirs.include? entry.split.last
    else
      out[:files] << entry.split.last unless black_files.include? entry.split.last
    end

But turns out, if you split the list up until last space, filenames and directories with spaces are fetched wrong. Need a little help on the logic here.

Upvotes: 6

Views: 3892

Answers (6)

Brett Green
Brett Green

Reputation: 3755

I'll add my solution to the mix...

Using ftp.nlst('**/*.*') did not work for me... server doesn't seem to support that ** syntax.

The chdir trick with a rescue seems expensive and hackish.

Assuming that all files have at least one char, a single period, and then an extension, I did a simple recursion.

  def list_all_files(ftp, folder)
    entries = ftp.nlst(folder)
    file_regex = /.+\.{1}.*/
    files = entries.select{|e| e.match(file_regex)}
    subfolders = entries.reject{|e| e.match(file_regex)}
    subfolders.each do |subfolder|
      files += list_all_files(ftp, subfolder)
    end
    files
  end

nlst seems to return the full path to whatever it finds non-recursively... so each time you get a listing, separate the files from the folders, and then process any folder you find recrsively. Collect all the file results.

To call, you can pass a starting folder

files = list_all_files(ftp, "my_starting_folder/my_sub_folder")
files = list_all_files(ftp, ".")
files = list_all_files(ftp, "")
files = list_all_files(ftp, nil)

Upvotes: 0

benjamin ratelade
benjamin ratelade

Reputation: 273

As @Alex pointed out, using patterns in filenames for this is hardly reliable. Directories CAN have dots in their names (.ssh for example), and listings can be very different on different servers.

His method works, but as he himself points out, takes too long. I prefer using the .size method from Net::FTP. It returns the size of a file, or throws an error if the file is a directory.

def item_is_file? (item)
    ftp = Net::FTP.new(host, username, password)
    begin 
    if ftp.size(item).is_a? Numeric
        true
    end
    rescue Net::FTPPermError
        return false
    end
end

Upvotes: 0

pczora
pczora

Reputation: 317

Assuming that the FTP server returns Unix-like file listings, the following code works. At least for me.

regex = /^d[r|w|x|-]+\s+[0-9]\s+\S+\s+\S+\s+\d+\s+\w+\s+\d+\s+[\d|:]+\s(.+)/
ftp.ls.each do |line|
    if dir = line.match(regex)
        puts dir[1]
    end
end

dir[1] contains the name of the directory (given that the inspected line actually represents a directory).

Upvotes: 2

Alex Kovshovik
Alex Kovshovik

Reputation: 4215

There are a huge variety of FTP servers around.

We have clients who use some obscure proprietary, Windows-based servers and the file listing returned by them look completely different from Linux versions.

So what I ended up doing is for each file/directory entry I try changing directory into it and if this doesn't work - consider it a file :)

The following method is "bullet proof":

# Checks if the give file_name is actually a file.
def is_ftp_file?(ftp, file_name)
  ftp.chdir(file_name)
  ftp.chdir('..')
  false
rescue
  true
end

file_names = ftp.nlst.select {|fname| is_ftp_file?(ftp, fname)}

Works like a charm, but please note: if the FTP directory has tons of files in it - this method takes a while to traverse all of them.

Upvotes: 3

iltempo
iltempo

Reputation: 16012

You can avoid recursion if you list all files at once

files = ftp.nlst('**/*.*')

Directories are not included in the list but the full ftp path is still available in the name.

EDIT

I'm assuming that each file name contains a dot and directory names don't. Thanks for mentioning @Niklas B.

Upvotes: 4

iltempo
iltempo

Reputation: 16012

You can also use a regular expression. I put one together. Please verify if it works for you as well as I don't know it your dir listing look different. You have to use Ruby 1.9 btw.

reg = /^(?<type>.{1})(?<mode>\S+)\s+(?<number>\d+)\s+(?<owner>\S+)\s+(?<group>\S+)\s+(?<size>\d+)\s+(?<mod_time>.{12})\s+(?<path>.+)$/

match = entry.match(reg)

You are able to access the elements by name then

match[:type] contains a 'd' if it's a directory, a space if it's a file.

All the other elements are there as well. Most importantly match[:path].

Upvotes: 2

Related Questions