Reputation: 3338
I am parsing a file and I get this string:
"���email@locale"
How can I make sure to clean a string that has these invalid chars "�"?
Upvotes: 0
Views: 281
Reputation: 6041
There are multiple ways to remove unwanted characters.
"���email@locale".chars.select(&:ascii_only?).join
=> "email@locale"
"���email@locale".gsub(/[^\p{Ascii}]/, '')
=> "email@locale"
This will break with domain names that include international characters, which is ok these days.
To allow ascii before the @
and anything after that, you can use something like this:
sanitized_email = "���email@locale"[/\p{Ascii}+?@[^\s]+/]
# now you can check if the email was valid at all:
raise "invalid email" if sanitized_email.nil?
Upvotes: 1