Reputation: 13
I am trying to delete punctuation and numbers to my pandas dataframe. here is my sample of code :
import re
import string
df.text = df.text.apply(lambda x: x.lower())
df.text = df.text.apply(lambda x: x.translate(None, string.punctuation))
and it gives me error :
TypeError: translate() takes exactly one argument (2 given)
I have tried to remove None in translate so it becomes:
df.text = df.text.apply(lambda x: x.translate(string.punctuation))
It gave me no errors, but this code didn't remove the punctuation as I wanted. I am using python 2.7. Can you help me ? Thank you in advance
Upvotes: 0
Views: 469
Reputation: 4792
Try this for python 2:
df = pd.DataFrame({'text': ['f!!o..o!', 'b""a??r', 'b?.?a!.!z']})
text
0 f!!o..o!
1 b""a??r
2 b?.?a!.!z
import string
table = string.maketrans("","")
df.text = df.text.apply(lambda x: x.translate(table, string.punctuation))
df
text
0 foo
1 bar
2 baz
The make_trans function makes the translation table which is like a dictionary(translate a key to a value)
Upvotes: 0
Reputation: 16593
You can use pandas' built-in Series.str.translate
:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'text': ['f!!o..o!', 'b""a??r', 'b?.?a!.!z']})
In [3]: df
Out[3]:
text
0 f!!o..o!
1 b""a??r
2 b?.?a!.!z
In [4]: import string
In [5]: df.text = df.text.str.translate(None, string.punctuation)
In [6]: df
text
0 foo
1 bar
2 baz
Upvotes: 1