curv
curv

Reputation: 3844

Regular expression help

I am currently doing a bunch of processing on a string using regular expressions with gsub() but I'm chaining them quite heavily which is starting to get messy. Can you help me construct a single regex for the following:

string.gsub(/\.com/,'').gsub(/\./,'').gsub(/&/,'and').gsub(' ','-').gsub("'",'').gsub(",",'').gsub(":",'').gsub("#39;",'').gsub("*",'').gsub("amp;",'')

Basically the above removes the following:

Is there an easier way to do this?

Upvotes: 0

Views: 166

Answers (3)

Xavier Holt
Xavier Holt

Reputation: 14619

Building on Tim's answer:

You can pass a block to String.gsub, so you could combine them all, if you wanted:

string.gsub(/\.com|[.,:*& ']/) do |sub|
    case(sub)
    when '&'
        'and'
    when ' '
        '-'
    else
        ''
    end
end

Or, building off echoback's answer, you could use a translation hash in the block (you may need to call translations.default = '' to get this working):

string.gsub(/\.com|[.,:*& ']/) {|sub| translations[sub]}

The biggest perk of using a block is only having one call to gsub (not the fastest function ever).

Hope this helps!

Upvotes: 0

wersimmon
wersimmon

Reputation: 2869

A translation table is more scalable as you add more options:

translations = Hash.new
translations['.com'] = ''
translations['&'] = 'and'
...

translations.each{ |from, to| string.gsub from, to }

Upvotes: 1

Tim
Tim

Reputation: 14154

You can combine the ones that remove characters:

string.gsub(/\.com|[.,:*]/,'')

The pipe | means "or". The right side of the or is a character class; it means "one of these characters".

Upvotes: 3

Related Questions