Christian Payne
Christian Payne

Reputation: 31

Is there a way to convert ANSI (Windows only) encoded files to UTF-8 using python?

The reason I am opening up a new question here is because all answers I can find seem to be using code that runs on Windows.
Here is the situation...
I receive new files every month for work that I need to convert to UTF-8 from an ANSI encoding. I have enough files for the need for automation so I have resorted to a python script. Until recently, I was on Windows and everything worked fine. After switching to Mac, I realized that ANSI is a Windows only encoding type and now my script no longer works.
Question: Is there a way to convert ANSI encoded CSVs to UTF-8 encoded while using a Mac?

Here is the code that WAS working on my Windows machine.

import sys
import os

if len(sys.argv) != 2:
  print(f"Converts the contents of a folder to UTF-8 from ASCI.")
  print(f"USAGE: \n\
    python ANSI_to_UTF8.py <Relative_Folder_Name> \n\
    If targeting a nested folder, make sure to use an escaped \\. ie: parent\\\\child")
  sys.exit()

from_encoding = "ANSI"
to_encoding = "UTF-8"
list_of_files = []
current_dir = os.getcwd()
folder = sys.argv[1]
suffix = "_utf8"
target_folder = folder + "_utf8"


try:
  os.mkdir(target_folder)
except FileExistsError:
  print("Target folder already exists.")
except:
  print("Error making directory!")

for root, dirs, files in os.walk(folder):
    for file in files:
        list_of_files.append(os.path.join(root,file))


for file in list_of_files:
  print(f"Converting {file}")

  original_path = file

  filename = file.split("\\")[-1].split(".")[0]
  extension = file.split("\\")[-1].split(".")[1]
  folder = "\\".join(original_path.split("\\")[0:-1])
  new_filename = filename + "." + extension
  new_path = os.path.join(target_folder, new_filename)

  f= open(original_path, 'r', encoding=from_encoding)
  content= f.read()
  f.close()
  f= open(new_path, 'w', encoding=to_encoding)
  f.write(content)
  f.close()

print(f"Finished converting {len(list_of_files)} files to {target_folder}")

It seems that no matter what approach I take, my Mac does not recognize the ANSI encoding type. Any help would be much appreciated. Thank you.

Edit 1: Reference Convert from ANSI to UTF-8
This question has two answers and neither work for me. Answer one, I get a utf8 error.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 25101: invalid continuation byte

Answer two, I believe the root cause is because I am on Mac and this OS does not understand mbcs encoding.

LookupError: unknown encoding: mbcs

Upvotes: 1

Views: 1896

Answers (1)

Christian Payne
Christian Payne

Reputation: 31

I found an answer to this problem.
Changing the ANSI codec to cp1252 allowed my Mac to see which codec I was looking for. So that fixed the issue. One other issue I came across right after was the fact that Mac does file paths a bit different, using forward slashes instead of back slashes.
Further modifications to this script and I came up with a working version.

import sys
import os

if len(sys.argv) != 2:
  print(f"Converts the contents of a folder to UTF-8 from ASCI.")
  print(f"USAGE: \n\
    python ANSI_to_UTF8.py <Relative_Folder_Name> \n\
    If targeting a nested folder, make sure to use an escaped \\. ie: parent\\\\child")
  sys.exit()

from_encoding = "cp1252"
to_encoding = "UTF-8"
list_of_files = []
current_dir = os.getcwd()
folder = sys.argv[1]
suffix = "_utf8"
target_folder = folder + "_utf8"


try:
  os.mkdir(target_folder)
except FileExistsError:
  print("Target folder already exists.")
except:
  print("Error making directory!")

for root, dirs, files in os.walk(folder):
    for file in files:
        list_of_files.append(os.path.join(root,file))


for file in list_of_files:
  print(f"Converting {file}")

  original_path = file

  filename = file.split("/")[-1].split(".")[0]
  extension = file.split("/")[-1].split(".")[1]
  folder = "/".join(original_path.split("/")[0:-1])
  new_filename = filename + "." + extension
  new_path = os.path.join(target_folder, new_filename)

  f= open(original_path, 'r', encoding=from_encoding)
  content= f.read()
  f.close()
  f= open(new_path, 'w', encoding=to_encoding)
  f.write(content)
  f.close()

print(f"Finished converting {len(list_of_files)} files to {target_folder}")

There are only small changes but this version allows Mac to understand the encoding and to route correctly.
Thanks again to all who helped!

Upvotes: 1

Related Questions