Noura
Noura

Reputation: 161

Python remove hashtag symbol and keep key words

I want to remove hashtag symbol ('#') and underscore that separate between words ('_')

Example: "this tweet is example #key1_key2_key3"

the result I want: "this tweet is example key1 key2 key3"

My code using string :

#Remove punctuation , # Hashtag Symbol 
translate_table = dict((ord(char), None) for char in string.punctuation)   
cleaned_combined_tweets.translate(translate_table)

which gives the result: "this tweet is example key1key2key3"

Upvotes: 6

Views: 12099

Answers (5)

Rithin gullapalli
Rithin gullapalli

Reputation: 1

You may use re module:

a = re.sub('([#])|([^a-zA-Z])',' ',a )

Upvotes: 0

Vidya P V
Vidya P V

Reputation: 481

Assuming that there will be only # and _ as punctuation:

import re

tweet = "this tweet is example #key1_key2_key3"
new_tweet = " ".join(word.strip() for word in re.split('#|_', tweet))
print (new_tweet)

Out: 'this tweet is example key1 key2 key3'

Upvotes: 0

machine_learner
machine_learner

Reputation: 11

You may use re module:

import re

a = 'this tweet is example #key1_key2_key3 sdasd #key1_key2_key3'

def get_all_hashtags(text):
    hash_pattern = re.compile('\#[\w\_]+',re.IGNORECASE)
    return re.findall(hash_pattern,text)

def clean_hashtags(hashtag, return_list=False):
    # return_list just in case you want a list
    if return_list:
        return re.split('\_',hashtag.replace('#',''))
    else:
        return ' '.join(re.split('[\_]+',hashtag.replace('#','')))

print([clean_hashtags(h,True) for h in get_all_hashtags(a)])
print([clean_hashtags(h) for h in get_all_hashtags(a)])

Upvotes: 0

PythonUser
PythonUser

Reputation: 27

first strip all hash tags as they are at the start, then replace all underscores with spaces; simple and easy solution.

revised code:

string = "This tweet is example #key1_key2_key3"
string = string.strip("#")
string = string.replace("_"," ")
print(string)

Upvotes: 1

pylang
pylang

Reputation: 44505

>>> "this tweet is example #key1_key2_key3".replace("#", "").replace("_", " ")

Upvotes: 4

Related Questions