LewlSauce
LewlSauce

Reputation: 5882

Replacing characters that don't match a particular regex expression

I have the following regex expression from Amazon Web Services (AWS) which is required for the Instance Name:

^([\p{L}\p{Z}\p{N}_.:/=+-@]*)$

However, I am unsure a more efficient way to find characters that do not match this string and replace them with just a simple space character.

For example, the string Hello (World) should be replaced to Hello World (the parentheses have been replaced with a space). This is just one of numerous examples of a character that does not match this string.

The only way I've been able to do this is by using the following code:

first_test_string.split('').each do |char|
    if char[/^([\p{L}\p{Z}\p{N}_.:\/=+-@]*)$/] == nil
        second_test_string = second_test_string.gsub(char, " ")
    end
end

When using this code, I get the following result:

irb(main):037:0> first_test_string = "Hello (World)"
=> "Hello (World)"
irb(main):038:0> second_test_string = first_test_string
=> "Hello (World)"
irb(main):039:0>
irb(main):040:0> first_test_string.split('').each do |char|
irb(main):041:1*     if char[/^([\p{L}\p{Z}\p{N}_.:\/=+-@]*)$/] == nil
irb(main):042:2>         second_test_string = second_test_string.gsub(char, " ")
irb(main):043:2>     end
irb(main):044:1> end
=> ["H", "e", "l", "l", "o", " ", "(", "W", "o", "r", "l", "d", ")"]
irb(main):045:0> first_test_string
=> "Hello (World)"
irb(main):046:0> second_test_string
=> "Hello  World "
irb(main):047:0>

Is there another way to do this, one that less hacky? I was hoping for a solution where I could just provide a regex string and then simply look for everything but the characters that match the regex string.

Upvotes: 0

Views: 28

Answers (1)

Schwern
Schwern

Reputation: 165198

Use String#gsub and negate the character class of acceptable characters with [^...].

2.6.5 :014 > "Hello (World)".gsub(%r{[^\p{L}\p{Z}\p{N}_.:/=+\-@]}, " ")
 => "Hello  World " 

Note I've also escaped - as [+-@] may be interpreted as the range of characters between + and @. For example, , lies between + and @.

2.6.5 :004 > "Hello, World".gsub(%r{[^\p{L}\p{Z}\p{N}_.:/=+-@]+}, " ")
 => "Hello, World" 
2.6.5 :005 > "Hello, World".gsub(%r{[^\p{L}\p{Z}\p{N}_.:/=+\-@]+}, " ")
 => "Hello  World" 

Add a + if you want multiple consecutive invalid characters to be replaced with a single space.

2.6.5 :024 > "((Hello~(World)))".gsub(%r{[^\p{L}\p{Z}\p{N}_.:/=+\-@]}, " ")
 => "  Hello  World   " 
2.6.5 :025 > "((Hello~(World)))".gsub(%r{[^\p{L}\p{Z}\p{N}_.:/=+\-@]+}, " ")
 => " Hello World " 

Upvotes: 1

Related Questions