Reputation: 43
The following instance method takes a file path and returns the file's prefix (the part before the separator):
@separator = "@"
def table_name path
regex = Regexp.new("\/[^\/]+#{@separator}")
path.match(regex)[0].gsub(/^.|.$/,'').downcase.to_sym
end
table_name "bla/bla/bla/[email protected]"
# => :prefix
So far, this method only works on Unix. To make it work on Windows, I also need to capture the backslash (\). Unfortunately, that's when I got stuck:
@separator = "@"
def table_name path
regex = Regexp.new("(\/|\\)[^\/\\]+#{@separator}")
path.match(regex)[0].gsub(/^.|.$/,'').downcase.to_sym
end
table_name("bla/bla/bla/[email protected]")
# RegexpError: premature end of char-class: /(\/|\)[^\/\]+@/
# Target result:
table_name("bla/bla/bla/[email protected]")
# => :prefix
table_name("bla\bla\bla\[email protected]")
# => :prefix
I suspect Ruby's string interpolation and escaping is what confuses me here.
How could I change the Regex to make it work on both Unix and Windows?
Upvotes: 4
Views: 3985
Reputation: 104070
I don't actually know what bla/bla/bla/[email protected]
refers to; is bla/bla/bla/bla
all directories, and the filename [email protected]
?
With the assumption that I've correctly understood your filenames, I suggest using File.split()
:
irb> (path, name) = File.split("bla/bla/bla/[email protected]")
=> ["bla/bla/bla", "[email protected]"]
irb> (prefix, postfix) = name.split("@")
=> ["Prefix", "invoice.csv"]
Not only is it platform-agnostic, it is more legible too.
Update
You piqued my curiosity:
>> wpath="blah\\blah\\blah\\[email protected]"
=> "blah\\blah\\blah\\[email protected]"
>> upath="bla/bla/bla/[email protected]"
=> "bla/bla/bla/[email protected]"
>> r=Regexp.new(".+[\\\\/]([^@]+)@(.+)")
=> /.+[\\\/]([^@]+)@(.+)/
>> wpath.match(r)
=> #<MatchData "blah\\blah\\blah\\[email protected]" 1:"Prefix" 2:"invoice.csv">
>> upath.match(r)
=> #<MatchData "bla/bla/bla/[email protected]" 1:"Prefix" 2:"invoice.csv">
You're right, the \
must be double-escaped for it to work in a regular expression: once to get past the interpreter, again to get past the regex engine. (Definitely feels awkward.) The regex is:
.+[\\/]([^@]+)@(.+)
The string is:
".+[\\\\/]([^@]+)@(.+)"
The regex, which might be too brittle for real use (how would it handle a path without /
or \
path separators or a pathname without @
or with too many @
?), looks for any number of characters, a single path separator, any amount of non-@, an @, then any amount of any characters. I'm assuming that the first .+
will greedily consume as many characters as possible to make the match as far to the right as possible:
>> evil_path="/foo/bar@baz/blorp/[email protected]"
=> "/foo/bar@baz/blorp/[email protected]"
>> evil_path.match(r)
=> #<MatchData "/foo/bar@baz/blorp/[email protected]" 1:"Prefix" 2:"invoice.csv">
But depending upon malformed input data, it might do the very wrong thing.
Upvotes: 6