jakc
jakc

Reputation: 1171

Remove words from string that have a prefix of # or @?

I have a block of text like this:

Hello @Simon, I had a great day today. #StackOverflow

I want to find the most elegant solution to stripping it down to look like this:

Hello, I had a great day today.

i.e. I want to strip out all words that have a prefix of # and @. (And yes, im inspecting tweets)

I am new to python, and I would be ok doing this on single words, but not sure on the best way to achieve this on a string that contains multiple words.

My first thoughts would be to use replace, but that would just strip out the actual @ and # symbols. Looking for the best way to strip out any word that has a prefix of # or @.

-EDIT- Not sure if it this invalidates the answers give, but for acceptance, I also need to strip out where multiple words contain a prefix of # or $. e.g. hello #hiya #ello

Upvotes: 0

Views: 667

Answers (3)

phihag
phihag

Reputation: 287865

You can use regular expressions:

>>> import re
>>> s = 'Hello @Simon, I had a great day today. #StackOverflow'
>>> re.sub(r'(?:^|\s)[@#].*?(?=[,;:.!?]|\s|$)', r'', s)
'Hello, I had a great day today.'

Upvotes: 4

Julien Vivenot
Julien Vivenot

Reputation: 2250

' '.join([w for w in s.split() if len(w)>1 and w[0] not in ['@','#']])

Where s is your tweet.

Upvotes: 0

user1552512
user1552512

Reputation: 909

It's as simple as writing an anonymous function and putting it in a filter statement

' '.join(filter(lambda x: x[0] not in ['@','#'], tweet.split()))

This will lose the comma on @users or #topics but if you're just processing the tweets you probably won't miss it.

Upvotes: 1

Related Questions