encoding problem with pgsql/python?

Question

I retrieved a bunch of text records from my postgresql database and intend to preprocess these text documents before analyzing them.

I want to tokenize the documents but ran into some problem during tokenizing

    #some other bunch of regex replacements
    #toToken is the text string    
    toTokens = self.regexClitics1.sub(" \1",toTokens)                   
    toTokens = self.regexClitics2.sub(" \1 \2",toTokens)

    toTokens = str.strip(toTokens)

The error is TypeError: descriptor 'strip' requires a 'str' object but received a 'unicode' I'm curious, why does this error occurs, when the encoding of the database is UTF-8?

Samuel · Accepted Answer

Why don't you use toTokens.strip(). No need of str module.

There are 2 string types in Python, str and unicode. Look at this for an explanation.

encoding problem with pgsql/python?

Answers (1)

Related Questions