Anay Purohit
Anay Purohit

Reputation: 13

Regex does not identify '#' for removing

How to remove '#' from words in a string which are followed by '#' and not just '#' if it is present by itself, in the middle of the word or even at the end.

Currently I am using the regex expression:

test = "# #DataScience"
test = re.sub(r'\b#\w\w*\b', '', test) 

for removing the "#' from the words starting with '#' but it does not work at all. It returns the string as it is

Can anyone please tell me why the "#" is not being recognized and removed? Examples -

test - "# #DataScience"

Expected Output - "# DataScience"

Test - "kjndjk#jnjkd"

Expected Output - "kjndjk#jnjkd"

Test - "# #DataScience #KJSBDKJ kjndjk#jnjkd #jkzcjkh# iusadhuish#""

Expected Output -"# DataScience KJSBDKJ kjndjk#jnjkd jkzcjkh# iusadhuish#"

Upvotes: 1

Views: 78

Answers (4)

Mohammed Elhag
Mohammed Elhag

Reputation: 4302

Try this :

test ="# #DataScience #KJSBDKJ kjndjk#jnjkd #jkzcjkh# iusadhuish#"
test = re.sub(r'(?<!\S)#(?=\S)', '', test)

Output :

# DataScience KJSBDKJ kjndjk#jnjkd jkzcjkh# iusadhuish#

Upvotes: 1

arieljuod
arieljuod

Reputation: 15838

I know there's an accepted answer, but I came up with this regexp that seems to work fine too, personally I prefer this since it's easier to read for me:

(\A|[^#\d\w])#\w\w*\b

Upvotes: 0

MoonMist
MoonMist

Reputation: 1227

Your \b is not correctly placed.

Your regex expression should be:

r'#\b\w+\b'

And also, the + quantifier means 1 or more occurrences which saves the need for your \w\w*

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521389

The problem with your pattern is that # is not a word character, therefore \b won't work with it. You may instead use a lookbehind:

test = "#HereToHelp STUFF #DataScience"
print(test)
test = re.sub(r'(?:(?<= )|^)#\w+\b', '', test)
print(test)

#HereToHelp STUFF #DataScience
 STUFF 

Upvotes: 0

Related Questions