Reputation: 5251

Remove a string pattern and symbols from string

I need to clean up a string from the phrase "not" and hashtags(#). (I also have to get rid of spaces and capslock and return them in arrays, but I got the latter three taken care of.)

Expectation:

"not12345"       #=> ["12345"]
"   notabc  "    #=> ["abc"]
"notone, nottwo" #=> ["one", "two"]
"notCAPSLOCK"    #=> ["capslock"]
"##doublehash"   #=> ["doublehash"]
"h#a#s#h"        #=> ["hash"]
"#notswaggerest" #=> ["swaggerest"]

This is the code I have

def some_method(string)
    string.split(", ").map{|n| n.sub(/(not)/,"").downcase.strip}
end

All of the above test does what I need to do except for the hash ones. I don't know how to get rid of the hashes; I have tried modifying the regex part: n.sub(/(#not)/), n.sub(/#(not)/), n.sub(/[#]*(not)/) to no avail. How can I make Regex to remove #?

Upvotes: 2

Answers (5)

Cary Swoveland

Reputation: 110755

arr = ["not12345", "   notabc", "notone, nottwo", "notCAPSLOCK",
       "##doublehash:", "h#a#s#h", "#notswaggerest"].

arr.flat_map { |str| str.downcase.split(',').map { |s| s.gsub(/#|not|\s+/,"") } }
  #=> ["12345", "abc", "one", "two", "capslock", "doublehash:", "hash", "swaggerest"]

When the block variable str is set to "notone, nottwo",

s = str.downcase
  #=> "notone, nottwo" 
a = s.split(',')
  #=> ["notone", " nottwo"] 
b = a.map { |s| s.gsub(/#|not|\s+/,"") }
  #=> ["one", "two"]

Because I used Enumerable#flat_map, "one" and "two" are added to the array being returned. When str #=> "notCAPSLOCK",

s = str.downcase
  #=> "notcapslock" 
a = s.split(',')
  #=> ["notcapslock"] 
b = a.map { |s| s.gsub(/#|not|\s+/,"") }
  #=> ["capslock"]

Upvotes: 3

engineersmnky

Reputation: 29613

Here is one more solution that uses a different technique of capturing what you want rather than dropping what you don't want: (for the most part)

a = ["not12345", "   notabc", "notone, nottwo", 
 "notCAPSLOCK", "##doublehash:","h#a#s#h", "#notswaggerest"]
a.map do |s|
     s.downcase.delete("#").scan(/(?<=not)\w+|^[^not]\w+/)
end 
#=> [["12345"], ["abc"], ["one", "two"], ["capslock"], ["doublehash"], ["hash"], ["swaggerest"]]

Had to delete the # because of h#a#s#h otherwise delete could have been avoided with a regex like /(?<=not|^#[^not])\w+/

Upvotes: 2

Matt

Reputation: 646

Ruby regular expressions allow comments, so to match the octothorpe (#) you can escape it:

"#foo".sub(/\#/, "") #=> "foo"

Upvotes: 1

hirolau

Reputation: 13921

Fun problem because it can use the most common string functions in Ruby:

result = values.map do |string|
 string.strip      # Remove spaces in front and back.
   .tr('#','')     # Transform single characters. In this case remove #
   .gsub('not','') # Substitute patterns
   .split(', ')    # Split into arrays.
end

p result #=>[["12345"], ["abc"], ["one", "two"], ["CAPSLOCK"], ["doublehash"], ["hash"], ["swaggerest"]]

I prefer this way rather than a regexp as it is easy to understand the logic of each line.

Upvotes: 1

davidhu

Reputation: 10472

You can use this regex to solve your problem. I tested and it works for all of your test cases.

/^\s*#*(not)*/

^ means match start of string
\s* matches any space at the start
#* matches 0 or more #
(not)* matches the phrase "not" zero or more times.

Note: this regex won't work for cases where "not" comes before "#", such as not#hash would return #hash

Upvotes: 1

Remove a string pattern and symbols from string

Answers (5)

Related Questions