stud333
stud333

Reputation: 63

TypeError: expected a character buffer object using .translate

I'm getting the error:

TypeError: expected a character buffer object

In the line where it says words=user_input_txt.translate(translate_table).lower().split(). I checked the type for the argument user_input_txt and its Type Unicode. I'm not sure what I'm doing wrong and don't quite understand previous postings. If someone could advise on how to fix I would greatly appreciate!

def contains_bad_words(user_input_txt):
    """ remove punctuation from text 
        and make it case-insensitive"""
    translate_table = dict((ord(char), None) for char in string.punctuation)
    words = user_input_txt.translate(translate_table).lower().split()
    for bad_word in blacklist:
        for word in words:
            if word == bad_word:
                return True
    return False

EDIT: I've revised my solution per the recommendation received from Daniel. However, I'm now getting the error:

TypeError: maketrans() takes exactly 2 arguments (1 given).

Could someone please advise what I'm doing wrong? I read that string.maketrans could take one argument as long as it's a dict. But translate_table is a dictionary no? Please help!!

def contains_bad_words(user_input_txt):
    """ remove punctuation from text 
        and make it case-insensitive"""
    translate_table = dict((ord(char), None) for char in string.punctuation)
    translate_table_new = string.maketrans(translate_table)
    words = user_input_txt.translate(translate_table_new).lower().split()
    for bad_word in blacklist:
        for word in words:
            if word == bad_word:
                return True
    return False

SECOND EDIT: So i fixed the problem by converting the unicode string to a string, and changing the number of arguments to maketrans. However, I'm still very puzzled why my solution above doesn't work. I read somewhere that it can take 1 argument provided it must be a dictionary, which is clearly what I did. Could someone help explain why the above doesn't work but the below does:

def contains_bad_words(user_input_txt):
    """ remove punctuation from text
        and make it case-insensitive"""
    user_typ = user_input_txt.encode()
    translate_table_new = maketrans(string.punctuation, 32*" ")
    words = user_typ.translate(translate_table_new).lower().split()
    for bad_word in blacklist:
        for word in words:
            if word == bad_word:
                return True
    return False

Upvotes: 2

Views: 1104

Answers (2)

Mark Tolonen
Mark Tolonen

Reputation: 177891

Your code isn't complete examples. It matters what your input is.

There are two versions of translate in Python 2: str.translate and unicode.translate. Here's the help on both:

>>> help(str.translate)
Help on method_descriptor:

translate(...)
    S.translate(table [,deletechars]) -> string

    Return a copy of the string S, where all characters occurring
    in the optional argument deletechars are removed, and the
    remaining characters have been mapped through the given
    translation table, which must be a string of length 256 or None.
    If the table argument is None, no translation is applied and
    the operation simply removes the characters in deletechars.

>>> help(unicode.translate)
Help on method_descriptor:

translate(...)
    S.translate(table) -> unicode

    Return a copy of the string S, where all characters have been mapped
    through the given translation table, which must be a mapping of
    Unicode ordinals to Unicode ordinals, Unicode strings or None.
    Unmapped characters are left untouched. Characters mapped to None
    are deleted.

If you have a byte string (str), then the table translate requires must be a byte string of length 256 or None. An optional 2nd argument to .translate deletes characters.

string.maketrans can generate the 256-byte string. It takes two arguments that must be the same length. Here's the help:

>>> import string
>>> help(string.maketrans)
Help on built-in function maketrans in module strop:

maketrans(...)
    maketrans(frm, to) -> string

    Return a translation table (a string of 256 bytes long)
    suitable for use in string.translate.  The strings frm and to
    must be of the same length.

Demo (a->1, b->2, c->3, delete d, e, and f):

>>> import string
>>> test = 'abcdefg'  # byte string in Python 2
>>> test.translate(string.maketrans('abc','123'),'def')
'123g'

The unicode version takes a dictionary of Unicode ordinals to Unicode ordinals, Unicode strings, or None.

Demo (change a->b, c->xxx, and delete d):

>>> test = u'abcdefg' # Unicode string in Python 2
>>> xlat = {ord('a'):ord('b'),ord('c'):u'xxx',ord('d'):None}
>>> test.translate(xlat)
u'bbxxxefg'

So for your examples, you want to delete punctuation. Depending on if you have a byte string or a Unicode string, choose one of the following:

>>> import string
>>> translate_table = dict((ord(char), None) for char in string.punctuation)
>>> u'abcd.,def'.translate(translate_table)
u'abcddef'

>>> import string
>>> 'abcd.,def'.translate(None,string.punctuation)
'abcddef'

Upvotes: 2

Daniel Roseman
Daniel Roseman

Reputation: 599758

.translate does not take a dict directly. You need to run it through str.maketrans first.

translate_table = dict((ord(char), None) for char in string.punctuation)
translate_table = str.maketrans(translate_table)
words = user_input_txt.translate(translate_table).lower().split()

Upvotes: 0

Related Questions