user3024562
user3024562

Reputation: 27

Convert all csv files from encodeing ansi to utf8 using python

I have python code as below:

import os
from os import listdir

def find_csv_filenames( path_to_dir, suffix=".csv" ):
    filenames = listdir(path_to_dir)
    return [ filename for filename in filenames if filename.endswith( suffix ) ]
    #always got the error this below code
filenames = find_csv_filenames('C:\casperjs\project\teleservices\csv')
for name in filenames:
    print name

I meet the error :

filenames = find_csv_filenames('C:\casperjs\project\teleservices\csv')
Error message: `TabError: inconsistent use of tabs and spaces in indentation`

What I need : I want to read all csv files and convert it from encoding ansi to utf8 but the code above is only read path of each csv files. I don't know what's wrong with it?

Upvotes: 0

Views: 7506

Answers (3)

Michael Kazarian
Michael Kazarian

Reputation: 4462

Below will convert each line in ascii-file:

import os
from os import listdir

def find_csv_filenames(path_to_dir, suffix=".csv" ):
    path_to_dir = os.path.normpath(path_to_dir)
    filenames = listdir(path_to_dir)
    #Check *csv directory
    fp = lambda f: not os.path.isdir(path_to_dir+"/"+f) and f.endswith(suffix)
    return [path_to_dir+"/"+fname for fname in filenames if fp(fname)]

def convert_files(files, ascii, to="utf-8"):
    for name in files:
        print "Convert {0} from {1} to {2}".format(name, ascii, to)
        with open(name) as f:
            for line in f.readlines():
                pass
                print unicode(line, "cp866").encode("utf-8")    

csv_files = find_csv_filenames('/path/to/csv/dir', ".csv")
convert_files(csv_files, "cp866") #cp866 is my ascii coding. Replace with your coding.

Upvotes: 1

luc
luc

Reputation: 43096

Your code is just listing csv files. It doesn't do anything with it. If you need to read it, you can use the csv module. If you need to manage encoding, you can do something like this:

import csv, codecs
def safe_csv_reader(the_file, encoding, dialect=csv.excel, **kwargs):
    csv_reader = csv.reader(the_file, dialect=dialect, **kwargs)
    for row in csv_reader:
        yield [codecs.decode(cell, encoding) for cell in row]

reader = safe_csv_reader(csv_file, "utf-8", delimiter=',')
for row in reader:
    print row

Upvotes: 0

Francesco Gramano
Francesco Gramano

Reputation: 364

Refer to documentation: http://docs.python.org/2/howto/unicode.html

If you need a string, say it is stored as s, that you want to encode as a specific format, you use s.encode()

Upvotes: 0

Related Questions