Steve
Steve

Reputation: 83

Handing conversion from bytes to string when not explicitly opening a file in Python 3

I am using the Requests module to authorise and then pull csv content from a web API and have it running fine in Python 2.7. I now want to write the same script in Python 3.5 but experiencing some issues:

"iterator should return strings, not bytes (did you open the file in text mode?)"

The requests.get seems to return bytes and not a string, which seems to be related to the encoding issues seen when moving to Python 3.x. The error is raised on the 3rd from last line: next(reader). In Python 2.7 this was not an issue because the csv functions were handled in 'wb' mode.

This article is very similar, but as I'm not opening a csv file directly, I cant seem to force the response text to be encoded this way: csv.Error: iterator should return strings, not bytes

countries = ['UK','US','CA']
datelist = [1,2,3,4]
baseurl = 'https://somewebsite.com/exporttoCSV.php'

#--- For all date/cc combinations
for cc in countries:
    for d in datelist:

        #---Build API String with variables
        url = (baseurl + '?data=chart&output=csv' +
               '&dataset=' + d + 
               '&cc=' + cc)

        #---Run API Call and create reader object
        r = requests.get(url, auth=(username, password))
        text = r.iter_lines()
        reader = csv.reader(text,delimiter=',')

        #---Write csv output to csv file with territory and date columns
        with open(cc + '_'+ d +'.csv','wt', newline='') as file:
            a = csv.writer(file)
            a.writerow(['position','id','title','kind','peers','territory','date']) #---Write header line
            next(reader) #---Skip original headers
            for i in reader:
                a.writerow(i +[countrydict[cc]] + [datevalue])

Upvotes: 6

Views: 13740

Answers (2)

Bamcclur
Bamcclur

Reputation: 2029

Without being able to test your exact scenario, I believe this should be solved by changing text = r.iter_lines() to:

text = (line.decode('utf-8') for line in r.iter_lines())

This should decode each line read in by r.iter_lines() from a byte string to a string usable by csv.reader

My test case is as follows:

>>> iter_lines = [b'1,2,3,4',b'2,3,4,5',b'3,4,5,6']
>>> text = (line.decode('utf-8') for line in iter_lines)
>>> reader = csv.reader(text, delimiter=',')
>>> next(reader)
['1', '2', '3', '4']
>>> for i in reader:
...     print(i)
...
['2', '3', '4', '5']
['3', '4', '5', '6']

Upvotes: 8

Aaron Lelevier
Aaron Lelevier

Reputation: 20810

Some files have to be read in as bytes, for example from Django SimpleUploadedFile, which is a testing class only uses bytes. Here is some example code from my test suite on how I got it working:

test_code.py

import os
from django.core.files.uploadedfile import SimpleUploadedFile
from django.test import TestCase

class ImportDataViewTests(TestCase):

    def setUp(self):
        self.path = "test_in/example.csv"
        self.filename = os.path.split(self.file)[1]

    def test_file_upload(self):
        with open(self.path, 'rb') as infile:
            _file = SimpleUploadedFile(self.filename, infile.read())

        # now an `InMemoryUploadedFile` exists, so test it as you shall!

prod_code.py

import csv

def import_records(self, infile):
    csvfile = (line.decode('utf8') for line in infile)
    reader = csv.DictReader(csvfile)

    for row in reader:
        # loop through file and do stuff!

Upvotes: 3

Related Questions