Reputation: 385
I have a problem with writing an regex (in Ruby, but I don't think that it changes anything) that selects all proper hashtags.
I used ( /(^|\s)(#+)(\w+)(\s|$)/ )
, which doesn't work and I have no idea why.
In this example:
#start #middle #middle2 #middle3 bad#example #another#bad#example #end
it should mark #start
, #middle
, #middle2
, #middle3
and #end
.
Why doesn't my code work and how should a proper regex look?
Upvotes: 0
Views: 249
Reputation: 492
As for why the original does not work lets look at each bit
(^|\s)
Start of line or white space(#+)
one or more #
(\w+)
one or more alphanumeric characters(\s|$)
white space or end of lineThe main problem is a conflict between 1 and 4. When 1 matches white space that white space was already matched in the last group as part 4. So 1 does not exist and the match moves to the next possible
4 is not really needed since 3 will not match white space.
So here is the result
(?:^|\s)#(\w+)
https://regex101.com/r/iU4dZ3/3
Upvotes: 4
Reputation: 11338
One more regex:
\B#\w+\b
This one doesn't capture whitespaces...
https://regex101.com/r/iU4dZ3/4
Upvotes: 0
Reputation: 4784
does [^#\w](#[\w]*)|^(#[\w]*)
works?
getting an # not following a character, and capturing everything until not a word.
the or case handle the case where the first char is #
.
Live demo: http://regexr.com/3al01
Upvotes: 1
Reputation: 23880
How's this work for you?
(#[^\s+]+)
This says find a hash tag then everything until a whitespaces.
Upvotes: 0