How do I remove a substring from a string in Ruby?

Question

I have the following string, and I want to remove everything between the tag including the tag itself:

"Great, I will send you something at 888@gmail.com.
    T888@gmail.com
Quick note on 888@gmail.com
      Hi, just dropping you a quick note."

I use the following to remove it:

string =  string.gsub(/(.*)/, '').strip

It does not work.

When I remove the from the string (I'd prefer not to because it makes formatting and inputing more limiting), then I get the following:

=> "Great, I will send you something at 888@gmail.com."

In other words, it works when I remove that.

How do I change my gsub statement to accommodate for and why does that cause the failure?

the Tin Man · Accepted Answer

What you're doing can work, but it's very fragile, and as a result is not recommended. Instead, use a parser like Nokogiri:

require 'nokogiri'

str = "Great, I will send you something at 888@gmail.com.
    T888@gmail.com
Quick note on 888@gmail.com
      Hi, just dropping you a quick note."

Here's how to parse the document:

doc = Nokogiri::XML::DocumentFragment.parse(str)

If the string was valid XML I could use a shorter method to parse:

doc = Nokogiri::XML(str)

Now find and remove the tag and its contents:

doc.at('EMAIL').remove
puts doc.to_xml
# >> Great, I will send you something at 888@gmail.com.

at finds the first tag named using a CSS selector. There are other related methods to find all matching tags or specific to CSS or XPath selectors.

XML/HTML parsers break the text down into nodes, making it easy to find things and manipulate them. The text can change, and as long as it's valid HTML or XML, properly written code will continue to work.

See the obligatory "RegEx match open tags except XHTML self-contained tags".

Regular expressions break down badly if there are embedded duplicate tags, something like:

bold italic another bold

Trying to strip the tags with patterns only would be painful. It's more easily done with a parser.

If I was absolutely bound-and-determined to do it without using a parser, this would work:

foo = "Great, I will send you something at 888@gmail.com. asdf sdfg dfgh" foo.gsub(%r#.*?#im, '').strip # => "Great, I will send you something at 888@gmail.com."

Or:

foo.gsub(%r#\s*.*?\s*#im, '') # => "Great, I will send you something at 888@gmail.com."

I prefer the first of these two because it's visually clearer.

Use the i flag to make the pattern case-insensitive: It'll match both and . Use the m flag to allow . to treat line-ends as if they were normal characters. The default is to treat them like they're special which makes a string with embedded line-ends be treated as multiple lines.

I'd prefer not to because it makes formatting and inputing more limiting

Sometimes it's easier to strip something like a trailing newline in the pattern, then re-add it later. If the choice is between maintaining a little Ruby code or a complicated pattern, I'd take the Ruby code. Patterns are powerful and I use them, but they're not the answer to everything.

How do I remove a substring from a string in Ruby?

Answers (2)

Related Questions