goh
goh

Reputation: 29581

encoding problem with pgsql/python?

I retrieved a bunch of text records from my postgresql database and intend to preprocess these text documents before analyzing them.

I want to tokenize the documents but ran into some problem during tokenizing

    #some other bunch of regex replacements
    #toToken is the text string    
    toTokens = self.regexClitics1.sub(" \\1",toTokens)                   
    toTokens = self.regexClitics2.sub(" \\1 \\2",toTokens)

    toTokens = str.strip(toTokens)

The error is TypeError: descriptor 'strip' requires a 'str' object but received a 'unicode' I'm curious, why does this error occurs, when the encoding of the database is UTF-8?

Upvotes: 0

Views: 1739

Answers (1)

Samuel
Samuel

Reputation: 2490

Why don't you use toTokens.strip(). No need of str module.

There are 2 string types in Python, str and unicode. Look at this for an explanation.

Upvotes: 4

Related Questions