Reputation: 687
I have a site where users can pick a username. Currently, they can put in almost any characters including things such as @ ! # etc.
I know I can use a regex, and that's probably what I'm opting for.
I'll be using a negated set, which I'm assuming is the right tool here as so:
[^@!#]
So, how can I know all of the illegal characters to put in that set? I can start manually putting in the ones that are obvious such as !@#$%^&*(), but is there an easy way to do this without manually putting every single one of them in?
I know a lot of sites only allow strings that contain alphabets, numbers, dashes, or underscores. Something like that would work well for me.
Any help would be greatly appreciated.
Thanks S.O.!
Upvotes: 18
Views: 17807
Reputation: 9493
All the answers on this question seem to assume English language. To allow for Unicode characters (so people can have URLs / user names in their native language), it is better to use a blacklist of reserved / unsafe characters rather than a whitelist of characters.
Here is a regex that matches characters which are generally unsafe in a URL:
([&$\+,:;=\?@#\s<>\[\]\{\}[\/]|\\\^%])+
(list based on unsafe characters mentioned in this answer)
Upvotes: 2
Reputation: 70732
Instead of using negation, place only what you want to allow inside of your character class.
^[a-zA-Z0-9_-]*$
Explanation:
^ # the beginning of the string
[a-zA-Z0-9_-]* # any character of: 'a' to 'z', 'A' to 'Z',
# '0' to '9', '_', '-' (0 or more times)
$ # before an optional \n, and the end of the string
Upvotes: 33
Reputation: 4069
One of the reasons you'll want to use an inclusive set is that limiting bad characters is very difficult with all the Unicode variants out there. Characters such as ß, ñ, oœ, æ will probably give you a headache. If you limit the username to just a subset of letters that YOU provide, you can easily chop out everything else that you may not want in there.
Upvotes: 2
Reputation: 3083
Instead of denying values, maybe it's better to only allow some
[:word:] -- Digits, letters and underscore
Check this chart
http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/
Upvotes: 3