Gopal Sharma
Gopal Sharma

Reputation: 11

How to remove # from hashtag using Python RegEx

My requirement is to remove leading "#" symbol from hashtags in a text. For example, sentence: I'm feeling #blessed. should transform to I'm feeling blessed.

I have written this function, but I'm sure I can achieve the same with a simpler logic in RegEx.

  clean_sentence = ""
  space = " "
  for token in sentence.split():
    if token[0] is '#':
      token = token[1:]
    clean_sentence += token + space
  return clean_sentence

Need help here!!

Upvotes: -1

Views: 1027

Answers (2)

Onno Rouast
Onno Rouast

Reputation: 672

The regex provided by by @Tim #(\S+) would also match hashtags in non-starting position if they have another non-whitespace character \S behind them, e.g. as in so#blessed.

We can prevent this by adding a negative lookbehind (?<!\S) before the hash, so that it can't be preceded by anything that is not a whitespace.

inp = "#I'm #feeling #blessed so#blessed .#here#."
output = re.sub(r'(?<!\S)#(\S+)', r'\1', inp)
print(output)

output:

I'm feeling blessed so#blessed .#here#.

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522751

You may use re.sub as follows:

inp = "I'm feeling #blessed."
output = re.sub(r'#(\S+)', r'\1', inp)
print(output)  # I'm feeling blessed.

Upvotes: 1

Related Questions