Reputation: 1886
I have a large file that I'm trying to import. The file is made up of millions of row of customer created data. As such, some users have used characters that are not recognised by the encoding (less than 1 character per 100,000 characters).
However, this is causing the code to break, as it doesn't recognise the character, and giving me the following error:
UnicodeEncodeError: 'charmap' codec can't encode character '\x96' in position 619: character maps to <undefined>
In the specific case above, the encoding doesn't recognise the long hyphen.
The code I am currently using to read the file, and conduct some transformation is:
def conversion(path, source, count):
file = open(path, "w")
iFile = open(source, 'r', encoding="utf-8")
len_text = 1
file.write("[\n")
for line in iFile: # For all the lines in the file
line = line.strip() # Remove newline/whitespace from begin and end of line
line = line.replace('"newDetails":{','')
line = line.replace('},"addrDate"',',"addrDate"')
line = line.replace('},"open24Id"',',"open24Id"')
if len_text != count: # While len_text does not equal line_count
line+= r"," # Add , to end of the line
line+= "\n" # Add \n to end of line
file.write(line) # Write line to file
else:
line += "\n" # Add \n to end of line
file.write(line) # Write line to file
len_text += 1 # Increment len_text by 1
file.write("]") # Write ] to end of file
file.close() # Close file
return
The break occurs on file.write(line)
.
How can I tell the script to search for, and replace the character \x96
with another character?
Upvotes: 2
Views: 524
Reputation: 310
Based on my comment: A try will catch the errored part of the message, the except is how you deal with that, so if you said
try:
your code
except UnicodeEncodeError:
break
would skip it, but doing something like
try:
your code
except UnicodeEncodeError:
file.write("Your character")
That will allow you to use your code, and when it hits that error, it will replace it with the character you want to replace it with. Play with the code to change it to how you want it to work, I just did a generic example.
Upvotes: 2