David542
David542

Reputation: 110502

Utf8 encoding with MySQLdb on non-utf symbols

I am receiving an xml feed which has values such as:

<Theme>Valentine&#39;s Day</Theme>
<Copyright>&#169; Ventures. All Rights Reserved.</Copyright>

I need to parse the value and store it in a mysql database. What would be the best way to cleanse the values so I can insert "Valentie's Day", "<copyright symbol> Ventures. All Rights Reserved."? There are about 20+ different marking like this.

Doing a straight INSERT, I'll get the following erro:

Warning: Incorrect string value: '\xA9 1987...' for column 'title' at row 1

Upvotes: 0

Views: 199

Answers (2)

Ned Batchelder
Ned Batchelder

Reputation: 376002

If you parse the XML with a real xml parser, you'll get Unicode strings as text. You can then encode them with UTF-8:

title = text.encode('utf8')

and title will be writable into your database, though many details are still unclear because we don't know how you're writing to your database.

Upvotes: 2

David542
David542

Reputation: 110502

Specify encoding and then ecode the string to utf8.

# -*- coding: utf-8 -*-
title = text.encode('utf8')

Upvotes: 0

Related Questions