Reputation: 4310
I have a field in my application where users can enter a hashtag. I want to validate their entry and make sure they enter what would be a proper HashTag. It can be in any language and it should NOT precede with the # sign. I am writing in JavaScript.
So the following are GOOD examples:
And the following are BAD examples:
We had a regex that matched only a-zA-Z0-9, we needed to add language support so we changed it to ignore white spaces and forgot to ignore special characters, so here I am.
Some other StackOverflow examples I saw but didn't work for me:
[edit]
Upvotes: 7
Views: 3709
Reputation: 19
/#[\p{L}\p{N}_]+/gu
This works for me, and addresses many of the concerns mentioned in comments.
Upvotes: 0
Reputation: 415
First if we exclude all symbol it will not a handy solution. Because symbol depends on keyboard layout and there are hundreds of math symbols and so on. So use this..
[\p{sc=Bengali}|\p{L}_\p{N}]+
1. If you think if language need extra care include like \p{sc=Bengali}|\p{sc=Spanish} etc. Suppose bangla has surrogate alphabet like া, ে ৌ etc so codepoint need to recognize Bangla separately first by \p{sc=Bengali}
2. Than use \p{L} that matches anything that is a Unicode letter a-z and letters like é,ü,ğ,i,ç too or normal any alphabet without complex...matches a single code point in the category "letter"
3. _ underscore allowed
4. \p{N} matches any kind of numeric character in any script. (\d matches only a digit (equal to [0-9]) but for allowed Unicode digit \p{N} only option, because its works with any digit codepoint)
Upvotes: 0
Reputation: 8423
I don't understand why this question does not get more votes. Hashtag detection for multiple languages is a problem. The only working option I could find is posted by Lucas above (all other ones do not work so well).
It needs a modification though:
#[^\s!@#$%^&*()=+.\/,\[{\]};:'"?><]+
this detects all the hashtags, not only in the beginning of the string, fixes an unescaped character, and removes the unnecessary $
in the end.
Upvotes: 4
Reputation: 51430
If your disallowed characters list is thorough (!@#$%^&*()=+./,[{]};:'"?><
), then the regex is:
^#?[^\s!@#$%^&*()=+./,\[{\]};:'"?><]+$
This allows an optional leading #
sign: #?
. It disallows the special characters using a negative character class. I just added \s
to the list (spaces), and also I escaped [
and ]
.
Unfortunately, you can't use constructs like \p{P}
(Unicode punctuation) in JavaScript's regexes, so you basically have to blacklist characters or take a different approach if the regex solution isn't good enough for your needs.
Upvotes: 5