Reputation: 696
I have a pandas dataframe column like below :
| ColumnA |
+-------------+
| ABCD(!) |
| <DEFG>(23) |
| (MNPQ. ) |
| 32.JHGF |
| "QWERT" |
Aim is to remove the special characters and produce the output as below :
| ColumnA |
+------------+
| ABCD |
| DEFG |
| MNPQ |
| JHGF |
| QWERT |
Tried using the replace
method like below, but without success :
df['ColumnA'] = df['ColumnA'].str.replace(r"[^a-zA-Z\d\_]+", "", regex=True)
print(df)
So, how can I replace the special characters using replace method in pandas?
Upvotes: 0
Views: 271
Reputation: 598
Your suggested code works fine on my installation with only extra digits so that you need to update your regex statement: r"[^a-zA-Z]+" If this doesn't work, then maybe try to update your pandas;
import pandas as pd
d = {'Column A': [' ABCD(!)', '<DEFG>(23)', '(MNPQ. )', ' 32.JHGF', '"QWERT"']}
df = pd.DataFrame(d)
df['ColumnA'] = df['ColumnA'].str.replace(r"[^a-zA-Z]+", "", regex=True)
print(df)
Output
Column A
0 ABCD
1 DEFG
2 MNPQ
3 JHGF
4 QWERT
Upvotes: 0
Reputation: 1432
regrex should be r'[^a-zA-Z]+'
, it means keep only the characters that are from A to Z, a-z
import pandas as pd
# | ColumnA |
# +-------------+
# | ABCD(!) |
# | <DEFG>(23) |
# | (MNPQ. ) |
# | 32.JHGF |
# | "QWERT" |
# create a dataframe from a list
df = pd.DataFrame(['ABCD(!)', 'DEFG(23)', '(MNPQ. )', '32.JHGF', 'QWERT'], columns=['ColumnA'])
# | ColumnA |
# +------------+
# | ABCD |
# | DEFG |
# | MNPQ |
# | JHGF |
# | QWERT |
# keep only the characters that are from A to Z, a-z
df['ColumnB'] =df['ColumnA'].str.replace(r'[^a-zA-Z]+', '')
print(df['ColumnB'])
Result:
0 ABCD
1 DEFG
2 MNPQ
3 JHGF
4 QWERT
Upvotes: 1
Reputation: 863031
Your solution is also for get numbers \d
and _
, so it remove only:
df['ColumnA'] = df['ColumnA'].str.replace(r"[^a-zA-Z]+", "")
print (df)
ColumnA
0 ABCD
1 DEFG
2 MNPQ
3 JHGF
4 QWERT
Upvotes: 1