Zolly
Zolly

Reputation: 317

How to match until hitting a certain character?

I apologize if my question is stupid. I just stared learning regex a few hours back.

I am trying to extract all hashtags

Special characters are allowed until it reaches a space/a hastag/newline

Here is my current regex: \#{1}(\S|\N)

I tried changing it to \#{1}.+(\S|\N) because i assumed the .+ will allow it to keep matching until it reaches a new line or space

======================TESTHASH========================
#3!x_j@`(/l3W#qfSnl#6R7x1b,jBb0p#Oq/:o#!tH3AITK^Yyp#B,
#qwe#%#T &#v#v#N###O###2#` `S}^&9 #M # Aa23%2##p#?#w#a
#123#9#Z a%h#&#C###;###? a#u#g#Q#r#8# #a#A#l#p#r#b#}#c
#R#M#(#p###K###l###1###b 2#D\'>.w/Y_2 sha2&2{] #4x$D~kR
#lbTb1k3# #Dlo ## #j# #W H#tjsR.Lzkc  #B*xt&nFty?il#jp
#>p8BTU2###PW!aB###z###-VM (s82hdk#T 8sUJWfuy2#-#f~fh)
#d{jyi|^ofYD#q)!#special~!@$%^&*()#_+`-=[];\',./?><\":}{
======================TESTHASH========================

Upvotes: 3

Views: 518

Answers (3)

sshine
sshine

Reputation: 16105

How about #[^#\s\n]+?

  • It matches all hashtags.
  • It stops at spaces and newlines.
  • It doesn't match two hashtags in a row. (This sentence is a bit ambiguous; is ## two hashtags of length zero, or zero hashtags? #[^#\s\n]* is equivalent to Sweeper's regex, but without the look-ahead. #[^#\s\n]+ additionally requires that hashtags don't have zero characters after them.)
  • All characters allowed after hashtag except hashtag, space and newline.

This is what #[^#\s\n]+ matches:

regex101 match image

It seems to secretly spell out "NICE"; I wonder if this is an exercise and you're using StackOverflow to think for you? :-)

Upvotes: 3

Arnab Mukherjee
Arnab Mukherjee

Reputation: 21

\#[^\s\#]*(\s|\#)

Matches a # followed by any number of chars other than whitespace and # which is followed by a whitespace or #

This should work

Upvotes: 0

Sweeper
Sweeper

Reputation: 271625

I made a few changes to your regex to get it match these:

enter image description here

This is the regex:

\#.*?(?=\s|\n|\#|$)

Changes I've made:

  • used a lazy "zero or more" quantifier *?. This means that it will keep matching until (?=\s|\n|\#|$) is not true, whereas with a greedy quantifier, it will match all the way to the end of the line, then backtracks until (?=\s|\n|\#|$) is true.

  • removed {1}, this is unnecessary

  • added more options to the end. I've added \# and $. They are characters that when encountered, should stop the match.
  • used a lookahead. This avoids getting another # into the match.

Demo

Upvotes: 3

Related Questions