Reputation: 1171
I have a block of text like this:
Hello @Simon, I had a great day today. #StackOverflow
I want to find the most elegant solution to stripping it down to look like this:
Hello, I had a great day today.
i.e. I want to strip out all words that have a prefix of # and @. (And yes, im inspecting tweets)
I am new to python, and I would be ok doing this on single words, but not sure on the best way to achieve this on a string that contains multiple words.
My first thoughts would be to use replace, but that would just strip out the actual @ and # symbols. Looking for the best way to strip out any word that has a prefix of # or @.
-EDIT- Not sure if it this invalidates the answers give, but for acceptance, I also need to strip out where multiple words contain a prefix of # or $. e.g. hello #hiya #ello
Upvotes: 0
Views: 667
Reputation: 287865
You can use regular expressions:
>>> import re
>>> s = 'Hello @Simon, I had a great day today. #StackOverflow'
>>> re.sub(r'(?:^|\s)[@#].*?(?=[,;:.!?]|\s|$)', r'', s)
'Hello, I had a great day today.'
Upvotes: 4
Reputation: 2250
' '.join([w for w in s.split() if len(w)>1 and w[0] not in ['@','#']])
Where s
is your tweet.
Upvotes: 0
Reputation: 909
It's as simple as writing an anonymous function and putting it in a filter statement
' '.join(filter(lambda x: x[0] not in ['@','#'], tweet.split()))
This will lose the comma on @users or #topics but if you're just processing the tweets you probably won't miss it.
Upvotes: 1