Reputation: 203
I want to insert data into a table in PostgreSQL, data is string or array of strings, the problem is that data contain some words (or string) which are not in utf8, and it makes the following error :
DataError: invalid byte sequence for encoding "UTF8": 0xe2 0x80 0x20
the code that I am using is the following and it is in Python:
cur.execute("INSERT INTO tst1 (tweet,words,tag,unknown_tags) VALUES (%s,%s,%s,%s);", (row[1],words,tags[0],tags[1:]))
In order to prevent inserting data with inappropriate Unicode, I was wondering if there is a way in python (or in PostgreSQL) to check the Unicode of the data before insert it into tst1 table?
Upvotes: 0
Views: 232
Reputation: 42768
Use always unicode strings internally. You should decode your strings directly at the point of input, e.g.:
try:
tags = tags.decode('utf8')
except UnicodeDecodeError:
# do what ever you like to do, if input is invalid
Upvotes: 1