Reputation: 4686
I have the current regular expression:
/(?<=[\s>]|^)#(\w*[A-Za-z_]+\w*)/g
Which I'm testing against the string:
Here's a #hashtag and here is #not_a_tag; which should be different. Also testing: Mid#hash. #123 #!@£ and <p>#hash</p>
For my purposes there should only be two hashtags detected in this string. I'm wondering how to alter the expression such that it doesn't match hashtags that end with a ;
in my example this is #not_a_tag;
Cheers.
Upvotes: 18
Views: 52872
Reputation: 180
(?<=(\s|^))#[^\s\!\@\#\$\%\^\&\*\(\)]+(?=(\s|$))
A regex code that matches any hashtag.
In this approach any character is accepted in hashtags except main signs !@#$%^&*()
Usage Notes
Turn on "g" and "m" flags when using!
It is tested for Java and JavaScript languages via https://regex101.com and VSCode tools.
It is available on this repo.
Upvotes: 1
Reputation: 1673
You could try this pattern : /#\S+/
It will include all characters after #
except for spaces.
Upvotes: 1
Reputation: 672
/(#(?:[^\x00-\x7F]|\w)+)/g
Starts with #, then at least one (+) ANCII symbols ([^\x00-\x7F], range excluding non-ANCII symbols) or word symbol (\w).
This one should cover cases including ANCII symbols like "#їжак".
Upvotes: 9
Reputation: 957
How about the following:
\B(\#[a-zA-Z]+\b)(?!;)
Upvotes: 39
Reputation: 2852
Similar to anubhava's answer but swap the 2 instances of \w*
with \d*
as the only difference between \w
and [A-Za-z_]
is the 0-9
characters
This has the effect of reducing the number of steps from 588 to 90
(?<=[\s>])#(\d*[A-Za-z_]+\d*)\b(?!;)
Upvotes: 1
Reputation: 785196
You can use a negative lookahead reegex:
/(?<=[\s>]|^)#(\w*[A-Za-z_]+\w*)\b(?!;)/
\b
- word boundary ensures that we are at end of word(?!;)
- asserts that we don't have semi-colon at next positionUpvotes: 4