Iron Hoang
Iron Hoang

Reputation: 179

Using unicode with SQLAlchemy and Pyramid

I have a textbox, a button and a search function. In the search function I get a request from textbox and using:

queryE = queryE.filter(queryE.Campaign.CampaignName.like("%"+CampaignsKeyWord+"%"))

If the keyword is just latin encoding, the result is ok, but when I enter unicode (for example, Chinese or Japanese), it doesn't work.

For example おはようございます is my string in db ok? When I type います to search it will compare and give me おはようございます result, right? But doesn't. When I print おはようございます, I can see ã?Šã?¯ã‚ˆã?†ã?”ã?–ã?„ã?¾ã?™ in the screen

Upvotes: 0

Views: 348

Answers (2)

Dave Anderson
Dave Anderson

Reputation: 118

If your Pyramid template or response for that page doesn't have the correct character encoding set it can cause the text to become garbled. I could be wrong, yet from the information provided it sounds like it has more to do with the HTML document than Pyramid or SQLAlchemy. If Python had a problem decoding text such as Japanese, it would likely raise UnicodeDecodeError for example, rather than trying to output the text garbled.

If you are using a template in Pyramid, such as Chameleon, it may have the wrong encoding set in it's meta tag. If so, try switching to 'utf-8' in the template similar to:

<meta charset="utf-8">

Upvotes: 1

Esailija
Esailija

Reputation: 140220

That is the UTF-8 bytes of おはようございます treated as Windows-1252. Typical with a windows terminal or web page with unset content type charset. But you don't have to worry, your program is outputting valid UTF-8, just compare them:

What you see converted to Windows-1252:

e3 3f 8a e3 3f af e3 82 88 e3 3f 86 e3 3f 94 e3 3f 96 e3 3f 84 e3 3f be e3 3f 99

Expected result in UTF-8:

e3 81 8a e3 81 af e3 82 88 e3 81 86 e3 81 94 e3 81 96 e3 81 84 e3 81 be e3 81 99

The only difference here is 0x3f ("?") instead of 0x81, this is because 0x81 is undefined for Windows-1252.


It's just a question of declaring the used encoding to the receiving end, with pyramid you can do:

response.charset = 'utf8'

Note that this is with a web page, if you mean windows terminal just forget about it.

Upvotes: 1

Related Questions