user43885
user43885

Reputation: 33

UnicodeDecode issue -- writing to a SAS program file

I have received a large set of sas files which all need to have their filepaths altered.

The code I've written for that tasks is as follows:

import glob
import os
import sys 

os.chdir(r"C:\path\subdir")
glob.glob('*.sas')
import os
fileLIST=[]
for dirname, dirnames, filenames in os.walk('.'):
    for filename in filenames:
        fileLIST.append(os.path.join(dirname, filename))
print fileLIST

import re

for fileITEM in set(fileLIST):
    dataFN=r"//path/subdir/{0}".format(fileITEM)
    dataFH=open(dataFN, 'r+')

    for row in dataFH:
    print row
        if re.findall('\.\.\.', str(row)) != []:
            dataSTR=re.sub('\.\.\.', "//newpath/newsubdir", row)
        print >> dataFH, dataSTR.encode('utf-8')
    else:
        print >> dataFH, row.encode('utf-8')
dataFH.close()

The issues I have are two fold: First, it seems as though my code does not recognize the three sequential periods, even when separated by a backslash. Second, I receive an error "UnicodeDecodeError: 'ascii' codec can't decode byte...'

Is it possible that SAS program files (.sas) are not utf-8? If so, is the fix as simple as knowing what file encoding they use?

The full traceback is as follows:

Traceback (most recent call last):
  File "stringsubnew.py", line 26, in <module>
    print >> dataFH, row.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position 671: ordinal not in range(128)

Thanks in advance

Upvotes: 0

Views: 374

Answers (1)

tommi00
tommi00

Reputation: 44

The problem lies with the reading rather than writing. You have to know what encoding lies within the source file you are reading from and decode it appropriately.

Let's say the source file contains data encoded with iso-8859-1

You can do this when reading using str.decode()

my_row = row.decode('iso-8859-1')

Or you can open the file using codecs to take care of it for you.

import codecs

dataFH = codecs.open(dataFN, 'r+', 'iso-8859-1')

A good talk on this can be found at http://nedbatchelder.com/text/unipain.html

Upvotes: 1

Related Questions