Reputation: 41

How do you use split and scan to parse a URI in ruby?

Let's say I have this string in Ruby

str = "/server/ab/file.html

I want to get an array that contains

["/server/", "/ab/", "file.html"]

Is there a way to obtain this array using split or scan? I have tried all kinds of combinations with nothing matching exactly what I want. I can't use any outside libraries. Any ideas? Thanks.

Upvotes: 1

Answers (3)

Aleksei Matiushkin

Reputation: 121010

▶ str.gsub(/(?<=\/)([\w.]+)(\/)?/).map { |m| "#{$2 && '/'}#{m}" } 
#⇒ [ "/server/", "/ab/", "file.html" ]

or, with scan, that is more semantic:

▶ str.scan(/(?<=\/)([\w.]+)(\/)?/).map { |(val,slash)| slash ? "/#{val}/" : val }

Probably the fastest solution:

▶ a = str[1..-1].split('/')
▶ [*a[0..-2].map { |e| "/#{e}/"}, a[-1]]
#⇒ ["/server/", "/ab/", "file.html"]

Complete inplace array change (hey, aesthetes):

▶ a = str[1..-1].split('/')
▶ a.pop.tap do |e| 
▷   a.map! do |e| 
▷     [-1, 0].each do |i| 
▷       e.insert(i, '/')
▷     end
▷     e
▷   end.push e
▷ end
▶ puts a
#⇒ ["/server/", "/ab/", "file.html"]

Upvotes: 2

Myst

Reputation: 19221

As @sawa stated, the issue is with the double '/' that requires you to manipulate the string.

The most direct solution I can think of is:

# removes the '/' at the beginning of the string
# and splits the string to an array
a = str.sub(/^\//, '').split('/') # => ["server", "ab", "file.html"]

# iterates through the array objects EXCEPT the last one,
# (notice three dots '...' instead of two '..'),
# and adds the missing '/'
a[0...-1].each {|s| s << '/'; s.insert(0 , '/')} # => ["/server/", "/ab/"]

a # => ["/server/", "/ab/", "file.html"]

EDIT 2

Following up with @mudasobwa's concepts, ideas and inputs, if you know that the first character is always a '/', this would be fastest solution so far (see edited benchmark):

        a = str[1..-1].split('/')
        a << (a.pop.tap { a.map! {|s| "/#{s}/" } } )

Good Luck.

Benchmarks

After reading @mudasobwa's answer I was super impressed. I wanted to know how much faster his solution was...

... and I was surprised to see that although his solution is much more elegant looking, it's substantially slower.

I have no idea why, but it seems that the Regexp lookup using gsub or scan is slower in this case.

Here's the benchmark, for anyone interested (iterations per second - higher numbers are better):

require 'benchmark/ips'

str = "/server/ab/file.html"
Benchmark.ips do |b|

    b.report("split") do
        a = str.sub(/^\//, '').split('/')
        a[0...-1].each {|s| s << '/'; s.insert(0 , '/')}
    end
    b.report("updated split") do
        a = str[1..-1].split('/')
        a[0...-1].each {|s| s << '/'; s.insert(0 , '/')}
    end
    b.report("scan") do
        str.scan(/(?<=\/)([\w.]+)(\/)?/).map { |(val,slash)| slash ? "/#{val}/" : val }
    end
    b.report("gsub") do
        str.gsub(/(?<=\/)([\w.]+)(\/)?/).map { |m| "#{$2 && '/'}#{m}" }
    end
    b.report("mudasobwa's varient") do
        a = str[1..-1].split('/')
        [*a[0..-2].map { |e| "/#{e}/"}, a[-1]]
    end
    b.report("mudasobwa's tap concept") do
        a = str[1..-1].split('/')
        a << (a.pop.tap { a.map! {|s| "/#{s}/" } })
    end

end; nil

# results:
#
# Calculating -------------------------------------
#                split    39.378k i/100ms
#        updated split    45.530k i/100ms
#                 scan    23.910k i/100ms
#                 gsub    18.006k i/100ms
#  mudasobwa's varient    47.389k i/100ms
# mudasobwa's tap concept
#                         51.895k i/100ms
# -------------------------------------------------
#                split    517.487k (± 2.9%) i/s -      2.599M
#        updated split    653.271k (± 6.4%) i/s -      3.278M
#                 scan    268.048k (± 6.9%) i/s -      1.339M
#                 gsub    202.457k (± 3.2%) i/s -      1.026M
#  mudasobwa's varient    656.734k (± 4.8%) i/s -      3.317M
# mudasobwa's tap concept
#                         761.914k (± 3.2%) i/s -      3.840M

Upvotes: 2

pangpang

Reputation: 8821

str = str[1..-1].split('/')
=> ["server", "ab", "file.html"]
str[0...-1].map!{|e| "/#{e}/"} << str[-1]
=> ["/server/", "/ab/", "file.html"]

Upvotes: 0

How do you use split and scan to parse a URI in ruby?

Answers (3)

Related Questions