Reputation: 83
This is print(df['Title'])
result.
I am performing regex to replace unnecessary characters
def remove_punctuations(text):
return re.sub(r']!@-#$%^&*(){};:,./<>?\|`~=_+',' ',text)
df1 = pd.read_csv(file2)
print(df1["Title"])
df1['Title'] = df1['Title'].apply(remove_punctuations)
print(df1["Title"])
What I am doing wrong. Please anyone point this out. Regards,
Upvotes: 0
Views: 166
Reputation: 164
Your regex expression is looking for an exact chain of "]!@-#$%^&*(){};:,./<>?\|
punctuations before substituting with a blank " "
.
Replace your function with:
def remove_punctuations(text):
return re.sub(r'[^\w\s]',' ',text)
where it would look for any instance of punctuations or white space.
Upvotes: 1
Reputation: 520908
You should be enclosing the special characters inside a character class, which is denoted by [...]
square brackets:
def remove_punctuations(text):
return re.sub(r'\s*[\[\]!@#$%^&*(){};:,./<>?\|`~=_+-]\s*', ' ', text).strip()
Note that the replacement logic used replaces standalone special characters with a single space. For the edge cases where special characters might start or end the input, we use strip()
.
Upvotes: 1