Reputation: 63
I'm getting the error:
TypeError: expected a character buffer object
In the line where it says words=user_input_txt.translate(translate_table).lower().split()
. I checked the type for the argument user_input_txt
and its Type Unicode. I'm not sure what I'm doing wrong and don't quite understand previous postings. If someone could advise on how to fix I would greatly appreciate!
def contains_bad_words(user_input_txt):
""" remove punctuation from text
and make it case-insensitive"""
translate_table = dict((ord(char), None) for char in string.punctuation)
words = user_input_txt.translate(translate_table).lower().split()
for bad_word in blacklist:
for word in words:
if word == bad_word:
return True
return False
EDIT: I've revised my solution per the recommendation received from Daniel. However, I'm now getting the error:
TypeError: maketrans() takes exactly 2 arguments (1 given).
Could someone please advise what I'm doing wrong? I read that string.maketrans
could take one argument as long as it's a dict. But translate_table
is a dictionary no? Please help!!
def contains_bad_words(user_input_txt):
""" remove punctuation from text
and make it case-insensitive"""
translate_table = dict((ord(char), None) for char in string.punctuation)
translate_table_new = string.maketrans(translate_table)
words = user_input_txt.translate(translate_table_new).lower().split()
for bad_word in blacklist:
for word in words:
if word == bad_word:
return True
return False
SECOND EDIT: So i fixed the problem by converting the unicode string to a string, and changing the number of arguments to maketrans. However, I'm still very puzzled why my solution above doesn't work. I read somewhere that it can take 1 argument provided it must be a dictionary, which is clearly what I did. Could someone help explain why the above doesn't work but the below does:
def contains_bad_words(user_input_txt):
""" remove punctuation from text
and make it case-insensitive"""
user_typ = user_input_txt.encode()
translate_table_new = maketrans(string.punctuation, 32*" ")
words = user_typ.translate(translate_table_new).lower().split()
for bad_word in blacklist:
for word in words:
if word == bad_word:
return True
return False
Upvotes: 2
Views: 1104
Reputation: 177891
Your code isn't complete examples. It matters what your input is.
There are two versions of translate
in Python 2: str.translate
and unicode.translate
. Here's the help on both:
>>> help(str.translate)
Help on method_descriptor:
translate(...)
S.translate(table [,deletechars]) -> string
Return a copy of the string S, where all characters occurring
in the optional argument deletechars are removed, and the
remaining characters have been mapped through the given
translation table, which must be a string of length 256 or None.
If the table argument is None, no translation is applied and
the operation simply removes the characters in deletechars.
>>> help(unicode.translate)
Help on method_descriptor:
translate(...)
S.translate(table) -> unicode
Return a copy of the string S, where all characters have been mapped
through the given translation table, which must be a mapping of
Unicode ordinals to Unicode ordinals, Unicode strings or None.
Unmapped characters are left untouched. Characters mapped to None
are deleted.
If you have a byte string (str
), then the table translate requires must be a byte string of length 256 or None. An optional 2nd argument to .translate
deletes characters.
string.maketrans
can generate the 256-byte string. It takes two arguments that must be the same length. Here's the help:
>>> import string
>>> help(string.maketrans)
Help on built-in function maketrans in module strop:
maketrans(...)
maketrans(frm, to) -> string
Return a translation table (a string of 256 bytes long)
suitable for use in string.translate. The strings frm and to
must be of the same length.
Demo (a
->1
, b
->2
, c
->3
, delete d
, e
, and f
):
>>> import string
>>> test = 'abcdefg' # byte string in Python 2
>>> test.translate(string.maketrans('abc','123'),'def')
'123g'
The unicode
version takes a dictionary of Unicode ordinals to Unicode ordinals, Unicode strings, or None.
Demo (change a
->b
, c
->xxx
, and delete d
):
>>> test = u'abcdefg' # Unicode string in Python 2
>>> xlat = {ord('a'):ord('b'),ord('c'):u'xxx',ord('d'):None}
>>> test.translate(xlat)
u'bbxxxefg'
So for your examples, you want to delete punctuation. Depending on if you have a byte string or a Unicode string, choose one of the following:
>>> import string
>>> translate_table = dict((ord(char), None) for char in string.punctuation)
>>> u'abcd.,def'.translate(translate_table)
u'abcddef'
>>> import string
>>> 'abcd.,def'.translate(None,string.punctuation)
'abcddef'
Upvotes: 2
Reputation: 599758
.translate
does not take a dict directly. You need to run it through str.maketrans
first.
translate_table = dict((ord(char), None) for char in string.punctuation)
translate_table = str.maketrans(translate_table)
words = user_input_txt.translate(translate_table).lower().split()
Upvotes: 0