pbbot
pbbot

Reputation: 23

Why is my urllib.quote in python encoding from Win-1252 instead of UTF-8 for CSV file?

I've been trying to URL Encode my inputs to get them ready for an API request and the urllib.quote works great with a string and encodes it the way it's supposed to from utf-8, but when it's from a csv file, it encodes it in a way that the API request does not recognize.

# -*- coding: utf-8 -*-
import urllib
r = "Handøl Sweden"
print urllib.quote(r)

This returns the correct format:

Hand%C3%B8l%20Sweden

Whereas:

# -*- coding: utf-8 -*-

import urllib
import csv

CityList = []

with open ('SiteValidate4.csv','rb') as csvfile:
    CityData = csv.reader(csvfile)
    for row in CityData:
        CityList.append(row[12])
        r = row[12]
print r
print urllib.quote(r)

This returns:

Handøl Sweden
Hand%F8l%20Sweden

Are there any fixes to encode the input from the .csv file to the correct format?

Upvotes: 1

Views: 733

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1122412

Your CSV file is encoded to CP-1252, you'd have to re-code that to UTF-8:

r = r.decode('cp1252').encode('utf8')

Your plain Python code was using UTF-8 bytes; provided your code editor indeed saved the data as UTF-8 as your coding: utf-8 header implies.

Just putting a PEP 263 header in your Python source file doesn't magically make all data you read from a file UTF-8 data too; it'll still need to be decoded with the correct codec for that file.

Upvotes: 3

Related Questions