Reputation: 39
My question is simple, here is the line:
<title><* page.title *></title>
i want to get the "page.title" part. I can do that by using these:
replacement = line.match(/\<\* .* \*\>/)
replacement_contain = replacement.to_s.match(/ .* /).to_s.strip
is there any shortcut or better way to do this ?
Upvotes: 0
Views: 43
Reputation: 110755
One way is use a capture group:
str = "<title><* page.title *></title>"
str[/\*\s+(.*)\s+\*/,1]
#=> "page.title"
The regular expression says to match on:
\* : one asterisk, followed by
\s+ : one or more spaces, followed by capture group #1
(.*) : which matches all characters until it reaches the last
\s+ : string of one or more spaces in the line that is followed by
\* : an asterisk
\1
is the content of capture group #1, which is extracted and returned by String#[].
Upvotes: 1
Reputation: 59363
require 'nokogiri'
require 'open-uri'
html = Nokogiri.HTML open('https://stackoverflow.com/questions/27879967/elegant-way-to-extarct-information-ruby-regex')
puts html.css('title').text
# => "Elegant way to extarct information ruby regex - Stack Overflow"
The answer to "how do I parse HTML with regex" is "don't, unless you know it will conform to strict XML rules."
For example, @sawa's and @Cary's solutions, while okay if you know what content your HTML will contain, fail if you have *>
anywhere else in your page, which is perfectly valid HTML. Use an HTML parser like Nokogiri instead (demonstrated above).
Upvotes: 2
Reputation: 168269
" <title><* page.title *></title> "[/(?<=\*).*(?=\*)/].strip #=> "page.title"
Upvotes: 1