user2543457
user2543457

Reputation: 39

Elegant way to extract information in Ruby regex

My question is simple, here is the line:

   <title><* page.title *></title>  

i want to get the "page.title" part. I can do that by using these:

replacement = line.match(/\<\* .* \*\>/)  
replacement_contain = replacement.to_s.match(/ .* /).to_s.strip    

is there any shortcut or better way to do this ?

Upvotes: 0

Views: 43

Answers (3)

Cary Swoveland
Cary Swoveland

Reputation: 110755

One way is use a capture group:

str = "<title><* page.title *></title>"

str[/\*\s+(.*)\s+\*/,1]
  #=> "page.title"

The regular expression says to match on:

\*   : one asterisk, followed by
\s+  : one or more spaces, followed by capture group #1
(.*) : which matches all characters until it reaches the last
\s+  : string of one or more spaces in the line that is followed by
\*   : an asterisk

\1 is the content of capture group #1, which is extracted and returned by String#[].

Upvotes: 1

tckmn
tckmn

Reputation: 59363

require 'nokogiri'
require 'open-uri'

html = Nokogiri.HTML open('https://stackoverflow.com/questions/27879967/elegant-way-to-extarct-information-ruby-regex')

puts html.css('title').text
# => "Elegant way to extarct information ruby regex - Stack Overflow"

The answer to "how do I parse HTML with regex" is "don't, unless you know it will conform to strict XML rules."

For example, @sawa's and @Cary's solutions, while okay if you know what content your HTML will contain, fail if you have *> anywhere else in your page, which is perfectly valid HTML. Use an HTML parser like Nokogiri instead (demonstrated above).

Upvotes: 2

sawa
sawa

Reputation: 168269

"   <title><* page.title *></title>  "[/(?<=\*).*(?=\*)/].strip #=> "page.title"

Upvotes: 1

Related Questions