Erik Peterson
Erik Peterson

Reputation: 115

Nifi ftp fails with path non-existent

Using nifi ListFTP and GetFTP processors I can access remote ftp directories and files as expected, except for this path:

/Oa 45° 25t 32rn

I get a non-existent path error. Other paths with spaces work fine. (and other clients 'filezilla' work fine with this path.) However, Nifi does not. If it's the degree char °, how do I escape it? I've tried:

  1. "/Oa 45° 25t 32rn"
  2. '/Oa 45° 25t 32rn'
  3. '"'/Oa 45° 25t 32rn'"'
  4. /Oa\ 45°\ 25t\ 32rn
  5. Oa%2045%C2%B0%2025t%2032rn (url encoding, trying it all)

Any ideas why this is failing and how to resolve? Thanks.

Upvotes: 2

Views: 396

Answers (1)

Andy
Andy

Reputation: 14194

I do not have an FTP server with a directory containing non-ASCII characters, so I cannot test this explicitly, but I would recommend using UTF-8 Unicode encoding 0xC2B0 or \uC2B0 to see if that works.

From FileZilla Character Encoding:

The FTP protocol is specified in RFC 959, which was published in 1985. The FTP protocol is designed on top of the original Telnet protocol, which is specified in RFC 854. The relevant sections of the Telnet specification regarding FTP are those covering the Network Virtual Terminal (NVT). According to RFC 854, the NVT requires the use of (7-bit) ASCII as the character set. Use of any other character set requires explicit negotiation. This character set only contains 127 different characters: English letters and numbers, punctuation characters and a few control characters. Accented letters, umlauts or other scripts are not contained in the ASCII character set.

In order to support non-English characters, the FTP specifications were extended in 1999 in RFC 2640. This extension requires the use of UTF-8 as the character set. This character set is a strict superset of ASCII, every valid ASCII character is also the same character in UTF-8. The UTF-8 character set can display any valid Unicode character. That includes umlauts, accented letters and also different scripts. This extension is fully backwards compatible with RFC 959.

As long as you're using only English characters, it doesn't matter if the software you are using supports RFC 2640 or not. However, if you use non-English characters without using RFC 2640 compatible software, there will be problems--problems which are entirely self-made by not obeying the specifications.

Upvotes: 2

Related Questions