Reputation: 15204
I recently discovered some flaws with my users. Some of the emails registered had some characters with different encodings others than UTF-8. So I'm trying to clean all those emails with gsub. By now I'm trying to capture all records with flaws using this regex. Explanation abou the regex: http://regexr.com/3bati
/\A[^@\s]+@([^@\s]+\.)+[^@\W]+\z/
But I'm not able to capture the following string which I inserted in the database as a flag
"\[email protected]".encode('utf-8')
How can I improve this regex to improve my validation and do not let encodings ruin my login?
Upvotes: 1
Views: 154
Reputation: 121000
As I understood your task, you want to make sure, that the email was entered by the user is what she wanted to enter. I would go with:
"\[email protected]".gsub(/[^\p{ASCII}]/, '').encode('ISO-8859-1')
First of all, you don’t need to assure it’s a valid email. The task differs. Secondary, all non-ascii should be filtered out. That’s likely it.
Of course, you might apply any further email validation check.
NB: #.encode
in the end is done to assure there is a valid ISO-8859-1
string left after a sanitarization.
Upvotes: 1