Ruby Regex to Match Both Unix And Windows File Paths

Question

The following instance method takes a file path and returns the file's prefix (the part before the separator):

@separator = "@"

def table_name path
  regex = Regexp.new("/[^/]+#{@separator}")
  path.match(regex)[0].gsub(/^.|.$/,'').downcase.to_sym
end

table_name "bla/bla/bla/Prefix@invoice.csv"
# => :prefix

So far, this method only works on Unix. To make it work on Windows, I also need to capture the backslash (\). Unfortunately, that's when I got stuck:

@separator = "@"

def table_name path
  regex = Regexp.new("(/|\)[^/\]+#{@separator}")
  path.match(regex)[0].gsub(/^.|.$/,'').downcase.to_sym
end

table_name("bla/bla/bla/Prefix@invoice.csv")
# RegexpError: premature end of char-class: /(/|\)[^/\]+@/

# Target result:
table_name("bla/bla/bla/Prefix@invoice.csv")
# => :prefix
table_name("bla\bla\bla\Prefix@invoice.csv")
# => :prefix

I suspect Ruby's string interpolation and escaping is what confuses me here.

How could I change the Regex to make it work on both Unix and Windows?

sarnold · Accepted Answer

I don't actually know what bla/bla/bla/Prefix@invoice.csv refers to; is bla/bla/bla/bla all directories, and the filename Prefix@invoice.csv?

With the assumption that I've correctly understood your filenames, I suggest using File.split():

irb> (path, name) = File.split("bla/bla/bla/Prefix@invoice.csv")
=> ["bla/bla/bla", "Prefix@invoice.csv"]
irb> (prefix, postfix) = name.split("@")
=> ["Prefix", "invoice.csv"]

Not only is it platform-agnostic, it is more legible too.

Update

You piqued my curiosity:

>> wpath="blah\blah\blah\Prefix@invoice.csv"
=> "blah\blah\blah\Prefix@invoice.csv"
>> upath="bla/bla/bla/Prefix@invoice.csv"
=> "bla/bla/bla/Prefix@invoice.csv"
>> r=Regexp.new(".+[\\/]([^@]+)@(.+)")
=> /.+[\/]([^@]+)@(.+)/
>> wpath.match(r)
=> #
>> upath.match(r)
=> #

You're right, the \ must be double-escaped for it to work in a regular expression: once to get past the interpreter, again to get past the regex engine. (Definitely feels awkward.) The regex is:

.+[\/]([^@]+)@(.+)

The string is:

".+[\\/]([^@]+)@(.+)"

The regex, which might be too brittle for real use (how would it handle a path without / or \ path separators or a pathname without @ or with too many @?), looks for any number of characters, a single path separator, any amount of non-@, an @, then any amount of any characters. I'm assuming that the first .+ will greedily consume as many characters as possible to make the match as far to the right as possible:

>> evil_path="/foo/bar@baz/blorp/Prefix@invoice.csv"
=> "/foo/bar@baz/blorp/Prefix@invoice.csv"
>> evil_path.match(r)
=> #

But depending upon malformed input data, it might do the very wrong thing.

Ruby Regex to Match Both Unix And Windows File Paths

Answers (1)

Related Questions