Yingqi
Yingqi

Reputation: 1535

How to return the same result from str.scan and str.match in Ruby using regex

I have a method to capture the extension as a group using a regex:

def test(str)

  word_match = str.match(/\.(\w*)/)

  word_scan = str.scan(/\.(\w*)/)

  puts word_match, word_scan

end

test("test.rb")

So it will return:

.rb
rb

Why would I get a different answer?

Upvotes: 0

Views: 71

Answers (2)

the Tin Man
the Tin Man

Reputation: 160571

Don't write your own code for this, take advantage of Ruby's own built-in code:

File.extname("test.rb")         # => ".rb"
File.extname("a/b/d/test.rb")   # => ".rb"
File.extname(".a/b/d/test.rb")  # => ".rb"
File.extname("foo.")            # => "."
File.extname("test")            # => ""
File.extname(".profile")        # => ""
File.extname(".profile.sh")     # => ".sh"

You're missing some cases. Compare the above to the output of your attempts:

fnames = %w[
  test.rb
  a/b/d/test.rb
  .a/b/d/test.rb
  foo.
  test
  .profile
  .profile.sh
]

fnames.map { |fn|
  fn.match(/\.(\w*)/).to_s 
}
# => [".rb", ".rb", ".a", ".", "", ".profile", ".profile"]

fnames.map { |fn|
  fn.scan(/\.(\w*)/).to_s  
}
# => ["[[\"rb\"]]",
#     "[[\"rb\"]]",
#     "[[\"a\"], [\"rb\"]]",
#     "[[\"\"]]",
#     "[]",
#     "[[\"profile\"]]",
#     "[[\"profile\"], [\"sh\"]]"]

The documentation for File.extname says:

Returns the extension (the portion of file name in path starting from the last period).

If path is a dotfile, or starts with a period, then the starting dot is not dealt with the start of the extension.

An empty string will also be returned when the period is the last character in path.

On Windows, trailing dots are truncated.

The File class has many more useful methods to pick apart filenames. There's also the Pathname class which is very useful for similar things.

Upvotes: 2

Tyler Ferraro
Tyler Ferraro

Reputation: 3772

The reason is that match and scan return different objects. match returns either a MatchData object or a String while scan returns an Array. You can see this by calling the class method on your variables

puts word_match.class # => MatchData
puts word_scan.class  # => Array

If you take a look at the to_s method on MatchData you'll notice it returns the entire matched string, rather than the captures. If you wanted just the captures you could use the captures method.

puts word_match.captures # => "rb"
puts word_match.captures.class # => Array

If you were to pass a block to the match method you would get a string back with similar results to the scan method.

word_match = str.match(/\.(\w*)/) { |m| m.captures } # => [["rb"]]
puts word_scan.inspect  #=> ["rb"]
puts word_match #=> "rb

More information on these methods and how they work can be found in the ruby-doc for the String class.

Upvotes: 4

Related Questions