sql-noob
sql-noob

Reputation: 423

Czech unicode issue in Python Django

I have this URL https://českébudějovice.mysite.com/ and it's a Czech city name. When someone accesses that url, I extract the subdomain and query for City model object. So I have City model in Django and can successfully query that city thru shell:

>> City.objects.get(name='českébudějovice')
>> <City: České Budějovice, Czech Republic>

However today I received an exception on Sentry on production saying that 'City matching query does not exist' and the URL is shown like this:

xn--eskbudjovice-deb41c5g.mysite.com

Obviously, I don't have a City with the name 'xn--eskbudjovice-deb41c5g' hence I'm getting 'City matching query does not exist' error.

I've been trying to convert that weird subdomain to the actual name but no luck. I've tried below:

>> s='xn--eskbudjovice-deb41c5g'
>> print s.encode('utf8')
>> xn--eskbudjovice-deb41c5g

I'm using Cloudflare and I wonder if it's somehow converting url to that form instead of serving it as unicode to my server.

Upvotes: 0

Views: 922

Answers (2)

Max Malysh
Max Malysh

Reputation: 31605

This is called Punycode and it's a valid way of representing international domain names.

You can decode the string using the 'idna' codec:

>>> s = 'xn--eskbudjovice-deb41c5g'
>>> print(s.decode('idna'))
českébudějovice

If you're on Python 3, use codecs to decode punycode.

Upvotes: 2

phd
phd

Reputation: 94676

$ python
Python 2.7.9 (default, Aug 13 2016, 16:41:35) 

>>> 'xn--eskbudjovice-deb41c5g'.decode('idna')
u'\u010desk\xe9bud\u011bjovice'

>>> print 'xn--eskbudjovice-deb41c5g'.decode('idna')
českébudějovice

Upvotes: 0

Related Questions