Patrick
Patrick

Reputation: 1428

Why is this Regex result unexpected

The regex in question is

/(<iframe.*?><\/iframe>)/

I am using this ruby regex to match sections of a string then creating an array of the results.

The string is

"<p><iframe src=\"http://www.dailymotion.com/embed/video/k18WBkRTMldXzB7JYW5?logo=0&#038;info=0\" frameborder=\"0\" height=\"450\" width=\"580\"></iframe></p>\n<p>#1<br />\n<iframe src=\"https://www.cloudy.ec/embed.php?id=cabe5d3ba31da\" allowfullscreen=\"\" frameborder=\"0\" height=\"420\" width=\"640\"></iframe></p>\n<p>#2<br />\n<iframe src=\"https://www.cloudy.ec/embed.php?id=b03d31e4b5663\" allowfullscreen=\"\" frameborder=\"0\" height=\"420\" width=\"640\"></iframe></p>\n<p>#3<br />\n<iframe src=\"https://www.cloudy.ec/embed.php?id=f63895add1aac\" allowfullscreen=\"\" frameborder=\"0\" height=\"420\" width=\"640\"></iframe></p>\n"

I am calling the regex is .match() like so

/(<iframe.*?><\/iframe>)/.match(entry.content).to_a

The result is a duplicate of the first match

["<iframe src=\"http://www.dailymotion.com/embed/video/k18WBkRTMldXzB7JYW5?logo=0&#038;info=0\" frameborder=\"0\" height=\"450\" width=\"580\"></iframe>", "<iframe src=\"http://www.dailymotion.com/embed/video/k18WBkRTMldXzB7JYW5?logo=0&#038;info=0\" frameborder=\"0\" height=\"450\" width=\"580\"></iframe>"]

I used Rubular and I was able to get the Regex to work there http://rubular.com/r/CYF0vgQtrX

Upvotes: 0

Views: 84

Answers (2)

7stud
7stud

Reputation: 48649

The result is a duplicate of the first match

Even though the docs for Regex#match() do a horrible job of describing what match() does, it actually finds the first match:

str = "abc"
md = /./.match(str)
p md.to_a

--output:--
["a"]

Regexp.match() returns a MatchData object when there is a match. A MatchData object contains matches for the whole match and for each group. If you call to_a() on a MatchData object, the return value is an Array containing the whole match and whatever matched each group in the regex:

str = "abc"
md = /(.)(.)(.)/.match(str)
p md.to_a

--output:--
["abc", "a", "b", "c"]

Because you specified a group in your regex, one result is the whole match, and the other result is what matched your group.

[A regex] was the first approach I thought of. If this wasn't going to work, then I was going to use nokogiri

From now on, nokogiri should be your first thought...because:

If you have a programming problem, and you think, "I'll use a regex", now you have two problems".

Upvotes: 2

Green Su
Green Su

Reputation: 2348

You should use scan instead of match here.

entry.content.scan(/<iframe.*?><\/iframe>/)

Using /(<iframe.*?><\/iframe>)/ will get a 2d array. The document says:

If the pattern contains groups, each individual result is itself an array containing one entry per group.

Upvotes: 1

Related Questions