kitz
kitz

Reputation: 889

Ruby remove parts of a string

I have a problem with some regular expressions in Ruby. This is the situation: Input text:

"NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”
Publicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35
Adresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla
Abonează-te
---- Here is some usefull text --- 
Abonează-te
× Citeşte mai mult »
Adauga un comentariu"

I need a regular expression witch can extract only useful text between "Abonează-te" word.

I tried this result = result.gsub(/^[.]{*}\nAbonează-te/, '') to remove the text from the start of the string to the 'Abonează-te' word, but this does not work. I have no ideea how to solve this situation. Can you help me?

Upvotes: 2

Views: 335

Answers (3)

hwnd
hwnd

Reputation: 70722

Your regex syntax is incorrect . inside of a character class means match a dot literally, and the {*} matches an opening curly brace "zero or more" times followed by a closing curly brace.

You can match instead of replacing here.

s.match(/Abonează-te(.*?)Abonează-te/m)[1].strip()

Upvotes: 1

falsetru
falsetru

Reputation: 368964

Instead of using regular expression, you can use String#split, then take the second part:

s = "NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”
Publicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35
Adresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla
Abonează-te
---- Here is some usefull text --- 
Abonează-te
× Citeşte mai mult »
Adauga un comentariu"
s.split('Abonează-te', 3)[1].strip  # 3: at most 3 parts
# => "---- Here is some usefull text ---"

UPDATE

If you want to get multiple matches:

s = "NU
Abonează-te
-- Here's some
Abonează-te
text --
Abonează-te
comentariu"
s.split('Abonează-te')[1..-2].map(&:strip)
# => ["-- Here's some", "text --"]

Upvotes: 2

Avinash Raj
Avinash Raj

Reputation: 174696

You could use string.scan function. You don't need to go for string.gsub function where you want to extract a particular text.

> s = "NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”
" Publicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35
" Adresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla
" Abonează-te
" ---- Here is some usefull text --- 
" Abonează-te
" × Citeşte mai mult »
" Adauga un comentariu"
=> "NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”\nPublicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35\nAdresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla\nAbonează-te\n---- Here is some usefull text --- \nAbonează-te\n× Citeşte mai mult »\nAdauga un comentariu"
irb(main):010:0> s.scan(/(?<=Abonează-te\n)[\s\S]*?(?=\nAbonează-te)/)
=> ["---- Here is some usefull text --- "]

Remove the newline \n character present inside the lookarounds if necessary. [\s\S]*? will do a non-greedy match of space or non-space characters zero or more times.

DEMO

Upvotes: 2

Related Questions