Reputation: 889
I have a problem with some regular expressions in Ruby. This is the situation: Input text:
"NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”
Publicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35
Adresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla
Abonează-te
---- Here is some usefull text ---
Abonează-te
× Citeşte mai mult »
Adauga un comentariu"
I need a regular expression witch can extract only useful text between "Abonează-te" word.
I tried this result = result.gsub(/^[.]{*}\nAbonează-te/, '')
to remove the text from the start of the string to the 'Abonează-te' word, but this does not work. I have no ideea how to solve this situation. Can you help me?
Upvotes: 2
Views: 335
Reputation: 70722
Your regex syntax is incorrect .
inside of a character class means match a dot literally, and the {*}
matches an opening curly brace "zero or more" times followed by a closing curly brace.
You can match instead of replacing here.
s.match(/Abonează-te(.*?)Abonează-te/m)[1].strip()
Upvotes: 1
Reputation: 368964
Instead of using regular expression, you can use String#split
, then take the second part:
s = "NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”
Publicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35
Adresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla
Abonează-te
---- Here is some usefull text ---
Abonează-te
× Citeşte mai mult »
Adauga un comentariu"
s.split('Abonează-te', 3)[1].strip # 3: at most 3 parts
# => "---- Here is some usefull text ---"
UPDATE
If you want to get multiple matches:
s = "NU
Abonează-te
-- Here's some
Abonează-te
text --
Abonează-te
comentariu"
s.split('Abonează-te')[1..-2].map(&:strip)
# => ["-- Here's some", "text --"]
Upvotes: 2
Reputation: 174696
You could use string.scan
function. You don't need to go for string.gsub
function where you want to extract a particular text.
> s = "NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”
" Publicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35
" Adresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla
" Abonează-te
" ---- Here is some usefull text ---
" Abonează-te
" × Citeşte mai mult »
" Adauga un comentariu"
=> "NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”\nPublicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35\nAdresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla\nAbonează-te\n---- Here is some usefull text --- \nAbonează-te\n× Citeşte mai mult »\nAdauga un comentariu"
irb(main):010:0> s.scan(/(?<=Abonează-te\n)[\s\S]*?(?=\nAbonează-te)/)
=> ["---- Here is some usefull text --- "]
Remove the newline \n
character present inside the lookarounds if necessary. [\s\S]*?
will do a non-greedy match of space or non-space characters zero or more times.
Upvotes: 2