Leo Folsom
Leo Folsom

Reputation: 685

Ruby remove all substrings that begin with specific character

I would like to remove all substrings from a string that begin with a pound sign and end in a space or are at the end of the string. I have a working solution, but I'm wondering if there's a more efficient (or equally efficient but less wordy) approach.

For example, I want to take "leo is #confused about #ruby #gsub" and turn it into "#confused #ruby #gsub".

Here is my solution for now, which involves arrays and subtraction.

strip_spaces = str.gsub(/\s+/, ' ').strip()
  => "leo is #confused about #ruby #gsub"
all_strings = strip_spaces.split(" ").to_a
  => ["leo", "is", "#confused", "about", "#ruby", "#gsub"]
non_hashtag_strings = strip_spaces.gsub(/(?:#(\w+))/) {""}.split(" ").to_a
  => ["leo", "is", "about"]
hashtag_strings = (all_strings - non_hashtag_strings).join(" ")
  => "#confused #ruby #gsub"

To be honest, now that I'm done writing this question, I've learned a few things through research/experimentation and become more comfortable with this array approach. But I still wonder if anyone could recommend an improvement.

Upvotes: 1

Views: 525

Answers (4)

engineersmnky
engineersmnky

Reputation: 29318

Always more ways to skin a cat

s = "leo is #confused about #ruby #gsub"
#sub all the words that do not start with a #
s.gsub(/(?<=^|\s)#\w+\s?/,'')
#=> "#confused #ruby #gsub"
#split to Array and grab all the strings that start with #
s.split.grep(/\A#/).join(' ')
#=> "#confused #ruby #gsub"
#split to Array and separate them into 2 groups
starts_with_hash,others = s.split.partition {|e| e.start_with?('#') }
#=>[["#confused", "#ruby", "#gsub"], ["leo", "is", "about"]]
starts_with_hash.join(' ') 
#=> "#confused #ruby #gsub"

Benchmarking of these and other answers as provided by fruity

require 'fruity'

def split_start_with(s)
    s.split.select {|e| e.start_with?("#")}.join(' ')
end

def with_scan(s)
    s.scan(/#\w+/).join(' ')
end

def with_gsub(s)    
  s.gsub(/(?<=^|\s)#\w+\s?/,'')
end

def split_grep(s)
    s.split.grep(/\A#/).join(' ')
end

str = "This is a reasonable string #withhashtags where I want to #test multiple #stringparsing #methods for separating and joinging #hastagstrings together for #speed"

compare do 
  split_start_with_test {split_start_with(str)}
  with_scan_test {with_scan(str)}
  with_gsub_test {with_gsub(str)}
  split_grep_test {split_grep(str)}
end

Results:

Running each test 262144 times. Test will take about 5 minutes.
split_start_with_test is similar to with_scan_test
with_scan_test is faster than with_gsub_test by 60.00000000000001% ± 1.0%
with_gsub_test is faster than split_grep_test by 30.000000000000004% ± 1.0%

Upvotes: 2

mikdiet
mikdiet

Reputation: 10018

Regexp only solution

string = "leo is #confused about #ruby #gsub"
string.scan(/#\w+/)
#  => ["#confused", "#ruby", "#gsub"] 

If you expect # sign inside the word, the regexp is slightly complex:

string = "leo is #confused ab#out #ruby #gsub"
string.scan(/(?<=\s)#\w+/)
#  => ["#confused", "#ruby", "#gsub"] 

Upvotes: 3

spickermann
spickermann

Reputation: 106802

I would do something like this:

string = "leo is #confused about #ruby #gsub"
#=> "leo is #confused about #ruby #gsub"
string.split.select { |word| word.start_with?('#') }.join(' ')
#=> "#confused #ruby #gsub"

Upvotes: 3

Richard Hamilton
Richard Hamilton

Reputation: 26434

You could try this

string.split(' ').select { |e| e.start_with?("#") }.join(' ')

Explanation

split - Breaks a string into an array of substrings based on a delimiter, in this case a space

select - Used to filter an array that matches the passed in expression

|e| e.start_with?("#") - Find only the substrings that start with a pound sign

join(' ') - Used to transform an array back to a string

Upvotes: 1

Related Questions