Reputation: 42207
i want do do multiple regular expression replacements on a array, i have this working code but it seems not the ruby-way, anyone who has a better solution ?
#files contains the string that need cleaning
files = [
"Beatles - The Word ",
"The Beatles - The Word",
"Beatles - Tell Me Why",
"Beatles - Tell Me Why (remastered)",
"Beatles - Love me do"
]
#ignore contains the reg expr that need to bee checked
ignore = [/the/,/\(.*\)/,/remastered/,/live/,/remix/,/mix/,/acoustic/,/version/,/ +/]
files.each do |file|
ignore.each do |e|
file.downcase!
file.gsub!(e," ")
file.strip!
end
end
p files
#=>["beatles - word", "beatles - word", "beatles - tell me why", "beatles - tell me why", "beatles - love me do"]
Upvotes: 1
Views: 172
Reputation: 42207
I made this solution from your answers, 2 versions, one with a conversion to string (doesn't change the files array and one with an extend of Array which does change the files array itself. The class approuch is 2x faster. If onyone still has suggestions, please share them.
files = [
"Beatles - The Word ",
"The Beatles - The Word",
"Beatles - Tell Me Why",
"The Beatles - Tell Me Why (remastered)",
"Beatles - wordwiththein wordwithlivein"
]
ignore = /\(.*\)|[_]|\b(the|remastered|live|remix|mix|acoustic|version)\b/
class Array
def cleanup ignore
self.each do |e|
e.downcase!
e.gsub!(ignore," ")
e.gsub!(/ +/," ")
e.strip!
end
end
end
p files.join("#").downcase!.gsub(ignore," ").gsub(/ +/," ").split(/ *# */)
#=>["beatles - word", "beatles - word", "beatles - tell me why", "beatles - tell me why", "beatles - wordwiththein wordwithlivein"]
Benchmark.bm do |x|
x.report("string method") { 10000.times { files.join("#").downcase!.gsub(ignore," ").gsub(/ +/," ").split(/ *# */) } }
x.report("class method") { 10000.times { files.cleanup ignore } }
end
=begin
user system total real
string method 0.328000 0.000000 0.328000 ( 0.327600)
class method 0.187000 0.000000 0.187000 ( 0.187200)
=end
Upvotes: 0
Reputation: 80075
ignore = ["the", "(", ".", "*", ")", "remastered", "live", "remix", "mix", "acoustic", "version", "+"]
re = Regexp.union(ignore)
p re #=> /the|\(|\.|\*|\)|remastered|live|remix|mix|acoustic|version|\+/
Regexp.union
takes care of escaping.
Upvotes: 3
Reputation: 336368
You can put most of these in a single regex replace operation. Also, you should be using word boundary anchors (\b
) or for example the
will also match There's a Place
.
file.gsub!(/(?:\b(?:the|remastered|live|remix|mix|acoustic|version)\b)|\([^()]*\)/, ' ')
should take care of this.
Then, you can strip multiple spaces in a second step:
file.gsub!(/ +/, ' ')
If you want to keep the regexes in an array, then you do need to iterate through the array and do the replacements for each regex. But you can at least take some commands out of the loop:
files.each do |file|
file.downcase!
ignore.each do |e|
file.gsub!(e," ")
end
file.strip!
end
Of course, then you will need to put word boundaries around each word in your ignore list:
ignore = [/\bthe\b/, /\([^()]*\)/, /\bremastered\b/, ...]
Upvotes: 1