Reputation: 13
How to remove '#'
from words in a string which are followed by '#'
and not just '#'
if it is present by itself, in the middle of the word or even at the end.
Currently I am using the regex expression:
test = "# #DataScience"
test = re.sub(r'\b#\w\w*\b', '', test)
for removing the "#' from the words starting with '#'
but it does not work at all. It returns the string as it is
Can anyone please tell me why the "#"
is not being recognized and removed?
Examples -
test - "# #DataScience"
Expected Output - "# DataScience"
Test - "kjndjk#jnjkd"
Expected Output - "kjndjk#jnjkd"
Test - "# #DataScience #KJSBDKJ kjndjk#jnjkd #jkzcjkh# iusadhuish#""
Expected Output -"# DataScience KJSBDKJ kjndjk#jnjkd jkzcjkh# iusadhuish#"
Upvotes: 1
Views: 78
Reputation: 4302
Try this :
test ="# #DataScience #KJSBDKJ kjndjk#jnjkd #jkzcjkh# iusadhuish#"
test = re.sub(r'(?<!\S)#(?=\S)', '', test)
Output :
# DataScience KJSBDKJ kjndjk#jnjkd jkzcjkh# iusadhuish#
Upvotes: 1
Reputation: 15838
I know there's an accepted answer, but I came up with this regexp that seems to work fine too, personally I prefer this since it's easier to read for me:
(\A|[^#\d\w])#\w\w*\b
Upvotes: 0
Reputation: 1227
Your \b
is not correctly placed.
Your regex expression should be:
r'#\b\w+\b'
And also, the +
quantifier means 1 or more occurrences which saves the need for your \w\w*
Upvotes: 0
Reputation: 521389
The problem with your pattern is that #
is not a word character, therefore \b
won't work with it. You may instead use a lookbehind:
test = "#HereToHelp STUFF #DataScience"
print(test)
test = re.sub(r'(?:(?<= )|^)#\w+\b', '', test)
print(test)
#HereToHelp STUFF #DataScience
STUFF
Upvotes: 0