Reputation: 161
I want to remove hashtag symbol ('#'
) and underscore that separate between words ('_'
)
Example: "this tweet is example #key1_key2_key3"
the result I want: "this tweet is example key1 key2 key3"
My code using string :
#Remove punctuation , # Hashtag Symbol
translate_table = dict((ord(char), None) for char in string.punctuation)
cleaned_combined_tweets.translate(translate_table)
which gives the result: "this tweet is example key1key2key3"
Upvotes: 6
Views: 12099
Reputation: 1
You may use re module:
a = re.sub('([#])|([^a-zA-Z])',' ',a )
Upvotes: 0
Reputation: 481
Assuming that there will be only # and _ as punctuation:
import re
tweet = "this tweet is example #key1_key2_key3"
new_tweet = " ".join(word.strip() for word in re.split('#|_', tweet))
print (new_tweet)
Out: 'this tweet is example key1 key2 key3'
Upvotes: 0
Reputation: 11
You may use re module:
import re
a = 'this tweet is example #key1_key2_key3 sdasd #key1_key2_key3'
def get_all_hashtags(text):
hash_pattern = re.compile('\#[\w\_]+',re.IGNORECASE)
return re.findall(hash_pattern,text)
def clean_hashtags(hashtag, return_list=False):
# return_list just in case you want a list
if return_list:
return re.split('\_',hashtag.replace('#',''))
else:
return ' '.join(re.split('[\_]+',hashtag.replace('#','')))
print([clean_hashtags(h,True) for h in get_all_hashtags(a)])
print([clean_hashtags(h) for h in get_all_hashtags(a)])
Upvotes: 0
Reputation: 27
first strip all hash tags as they are at the start, then replace all underscores with spaces; simple and easy solution.
revised code:
string = "This tweet is example #key1_key2_key3"
string = string.strip("#")
string = string.replace("_"," ")
print(string)
Upvotes: 1
Reputation: 44505
>>> "this tweet is example #key1_key2_key3".replace("#", "").replace("_", " ")
Upvotes: 4