Reputation: 4897
If I had a string as:
text = 1st RULE: You do not talk about FIGHT CLUB.
2nd RULE: You DO NOT talk about FIGHT CLUB.
3rd RULE: If someone says 'stop' or goes limp, taps out the fight is over.
4th RULE: Only two guys to a fight.
5th RULE: One fight at a time.
6th RULE: No shirts, no shoes.
7th RULE: Fights will go on as long as they have to.
8th RULE: If this is your first night at FIGHT CLUB, you HAVE to fight.
How can I go about removing some characters from it? What I would like to do is remove the characters that match /[0-9:;!\?(){}%$#@*-,.<>"'+=]/
.
I'd like to count the frequency of the word in the text but these pesky charactares are giving me problems. I think the best solution would be to use:
text.map! do |word|
if word =~ /[:;!\?(){}%$#@*-,.<>"'+=]/
#do something that removes the character....
end
end
I can't seem to find the right solution. I've looked through the docs and I've tried a swing at delete_if
, delete_at
, drop_while
, but it seems to remove the entire element word, which is what I DON'T want. I've tried String methods such as gsub
with a blank space, but it doesn't work the way I think it should.
Can someone please direct me to the right track? I don't want to delete the entire element, just the instance of those matches.
When I think about it, gsub
could work but would I replace it with a whitespace? That would cause problems, if it was in the middle of the string I'm trying to replace though.
I intend to store them in a Hash.
Upvotes: 1
Views: 180
Reputation: 160551
Here's some interesting benchmark results for patterns:
require 'fruity'
text = "1st RULE: You do not talk about FIGHT CLUB.
2nd RULE: You DO NOT talk about FIGHT CLUB.
3rd RULE: If someone says 'stop' or goes limp, taps out the fight is over.
4th RULE: Only two guys to a fight.
5th RULE: One fight at a time.
6th RULE: No shirts, no shoes.
7th RULE: Fights will go on as long as they have to.
8th RULE: If this is your first night at FIGHT CLUB, you HAVE to fight."
INCLUSIVE_PATTERN_STRING = %/[0-9:;!?(){}%$#@*,.<>"'+=-]/
EXCLUSIVE_PATTERN_STRING = %/[^a-z\n ]/
text.gsub(/#{ INCLUSIVE_PATTERN_STRING }/i, '') # => "st RULE You do not talk about FIGHT CLUB\nnd RULE You DO NOT talk about FIGHT CLUB\nrd RULE If someone says stop or goes limp taps out the fight is over\nth RULE Only two guys to a fight\nth RULE One fight at a time\nth RULE No shirts no shoes\nth RULE Fights will go on as long as they have to\nth RULE If this is your first night at FIGHT CLUB you HAVE to fight"
text.gsub(/#{ EXCLUSIVE_PATTERN_STRING }/i, '') # => "st RULE You do not talk about FIGHT CLUB\nnd RULE You DO NOT talk about FIGHT CLUB\nrd RULE If someone says stop or goes limp taps out the fight is over\nth RULE Only two guys to a fight\nth RULE One fight at a time\nth RULE No shirts no shoes\nth RULE Fights will go on as long as they have to\nth RULE If this is your first night at FIGHT CLUB you HAVE to fight"
text.gsub(/#{ INCLUSIVE_PATTERN_STRING }+/i, '') # => "st RULE You do not talk about FIGHT CLUB\nnd RULE You DO NOT talk about FIGHT CLUB\nrd RULE If someone says stop or goes limp taps out the fight is over\nth RULE Only two guys to a fight\nth RULE One fight at a time\nth RULE No shirts no shoes\nth RULE Fights will go on as long as they have to\nth RULE If this is your first night at FIGHT CLUB you HAVE to fight"
text.gsub(/#{ EXCLUSIVE_PATTERN_STRING }+/i, '') # => "st RULE You do not talk about FIGHT CLUB\nnd RULE You DO NOT talk about FIGHT CLUB\nrd RULE If someone says stop or goes limp taps out the fight is over\nth RULE Only two guys to a fight\nth RULE One fight at a time\nth RULE No shirts no shoes\nth RULE Fights will go on as long as they have to\nth RULE If this is your first night at FIGHT CLUB you HAVE to fight"
compare do
inclusive { text.gsub(/#{ INCLUSIVE_PATTERN_STRING }/i, '') }
exclusive { text.gsub(/#{ EXCLUSIVE_PATTERN_STRING }/i, '') }
greedy_inclusive { text.gsub(/#{ INCLUSIVE_PATTERN_STRING }+/i, '') }
greedy_exclusive { text.gsub(/#{ EXCLUSIVE_PATTERN_STRING }+/i, '') }
end
Running that results in:
Running each test 128 times. Test will take about 1 second.
inclusive is faster than exclusive by 30.000000000000004% ± 1.0%
exclusive is faster than greedy_exclusive by 10.000000000000009% ± 1.0%
greedy_exclusive is faster than greedy_inclusive by 10.000000000000009% ± 1.0%
Upvotes: 1
Reputation: 37517
[Converting my comment to an answer]
Use:
text.gsub(/[0-9:;!\?(){}%$#@*,.<>"'+=-]/, '')
This will remove the characters. Note that I moved the -
to the end to prevent it from being interpreted as a range.
It's possible you could simplify the regex by using a negated set (specifying those you want to keep):
text.gsub(/[^a-z ]/i, '')
Upvotes: 3