Far
Far

Reputation: 203

How to check Unicode of string data which is supposed to be inserted in PostgreSQL table using Python language?

I want to insert data into a table in PostgreSQL, data is string or array of strings, the problem is that data contain some words (or string) which are not in utf8, and it makes the following error :

DataError: invalid byte sequence for encoding "UTF8": 0xe2 0x80 0x20

the code that I am using is the following and it is in Python:

cur.execute("INSERT INTO tst1 (tweet,words,tag,unknown_tags) VALUES (%s,%s,%s,%s);", (row[1],words,tags[0],tags[1:]))

In order to prevent inserting data with inappropriate Unicode, I was wondering if there is a way in python (or in PostgreSQL) to check the Unicode of the data before insert it into tst1 table?

Upvotes: 0

Views: 232

Answers (1)

Daniel
Daniel

Reputation: 42768

Use always unicode strings internally. You should decode your strings directly at the point of input, e.g.:

try:
    tags = tags.decode('utf8')
except UnicodeDecodeError:
    # do what ever you like to do, if input is invalid

Upvotes: 1

Related Questions