Reputation: 31
The reason I am opening up a new question here is because all answers I can find seem to be using code that runs on Windows.
Here is the situation...
I receive new files every month for work that I need to convert to UTF-8 from an ANSI encoding. I have enough files for the need for automation so I have resorted to a python script. Until recently, I was on Windows and everything worked fine. After switching to Mac, I realized that ANSI is a Windows only encoding type and now my script no longer works.
Question:
Is there a way to convert ANSI encoded CSVs to UTF-8 encoded while using a Mac?
Here is the code that WAS working on my Windows machine.
import sys
import os
if len(sys.argv) != 2:
print(f"Converts the contents of a folder to UTF-8 from ASCI.")
print(f"USAGE: \n\
python ANSI_to_UTF8.py <Relative_Folder_Name> \n\
If targeting a nested folder, make sure to use an escaped \\. ie: parent\\\\child")
sys.exit()
from_encoding = "ANSI"
to_encoding = "UTF-8"
list_of_files = []
current_dir = os.getcwd()
folder = sys.argv[1]
suffix = "_utf8"
target_folder = folder + "_utf8"
try:
os.mkdir(target_folder)
except FileExistsError:
print("Target folder already exists.")
except:
print("Error making directory!")
for root, dirs, files in os.walk(folder):
for file in files:
list_of_files.append(os.path.join(root,file))
for file in list_of_files:
print(f"Converting {file}")
original_path = file
filename = file.split("\\")[-1].split(".")[0]
extension = file.split("\\")[-1].split(".")[1]
folder = "\\".join(original_path.split("\\")[0:-1])
new_filename = filename + "." + extension
new_path = os.path.join(target_folder, new_filename)
f= open(original_path, 'r', encoding=from_encoding)
content= f.read()
f.close()
f= open(new_path, 'w', encoding=to_encoding)
f.write(content)
f.close()
print(f"Finished converting {len(list_of_files)} files to {target_folder}")
It seems that no matter what approach I take, my Mac does not recognize the ANSI encoding type. Any help would be much appreciated. Thank you.
Edit 1: Reference Convert from ANSI to UTF-8
This question has two answers and neither work for me. Answer one, I get a utf8 error.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 25101: invalid continuation byte
Answer two, I believe the root cause is because I am on Mac and this OS does not understand mbcs encoding.
LookupError: unknown encoding: mbcs
Upvotes: 1
Views: 1896
Reputation: 31
I found an answer to this problem.
Changing the ANSI codec to cp1252 allowed my Mac to see which codec I was looking for. So that fixed the issue. One other issue I came across right after was the fact that Mac does file paths a bit different, using forward slashes instead of back slashes.
Further modifications to this script and I came up with a working version.
import sys
import os
if len(sys.argv) != 2:
print(f"Converts the contents of a folder to UTF-8 from ASCI.")
print(f"USAGE: \n\
python ANSI_to_UTF8.py <Relative_Folder_Name> \n\
If targeting a nested folder, make sure to use an escaped \\. ie: parent\\\\child")
sys.exit()
from_encoding = "cp1252"
to_encoding = "UTF-8"
list_of_files = []
current_dir = os.getcwd()
folder = sys.argv[1]
suffix = "_utf8"
target_folder = folder + "_utf8"
try:
os.mkdir(target_folder)
except FileExistsError:
print("Target folder already exists.")
except:
print("Error making directory!")
for root, dirs, files in os.walk(folder):
for file in files:
list_of_files.append(os.path.join(root,file))
for file in list_of_files:
print(f"Converting {file}")
original_path = file
filename = file.split("/")[-1].split(".")[0]
extension = file.split("/")[-1].split(".")[1]
folder = "/".join(original_path.split("/")[0:-1])
new_filename = filename + "." + extension
new_path = os.path.join(target_folder, new_filename)
f= open(original_path, 'r', encoding=from_encoding)
content= f.read()
f.close()
f= open(new_path, 'w', encoding=to_encoding)
f.write(content)
f.close()
print(f"Finished converting {len(list_of_files)} files to {target_folder}")
There are only small changes but this version allows Mac to understand the encoding and to route correctly.
Thanks again to all who helped!
Upvotes: 1