Python3 UnicodeDecodeError on utf8

Question

No matter what I do I couldn't fix it. The script I need to fix is this;

# Read the original file and write to a new file
input_file = 'input.txt'
output_file = 'output.txt'

with open(input_file, 'rb') as f:
    content = f.read()

# Filter out non-UTF-8 characters
cleaned_content = content.decode('utf-8', errors='replace').replace('�','?')

# Split the cleaned content into lines
lines = cleaned_content.splitlines()

# Sort the lines
sorted_lines = sorted(lines)

# Write the sorted lines to a new file
with open(output_file, 'w', encoding='utf-8') as f:
    for line in sorted_lines:
        f.write(line + '
')

What I want is to file to never give me UnicodeDecodeError when I do with open(file_path, 'r', encoding='utf-8') as file:

Long story short I have a byte-search script working on sorted file. If I do with open(file_path, 'r', encoding='utf-8', errors='replace') as file: It doesn't work properly because it's changing the character that would give UnicodeDecodeError normally. Imagine the file is like that it's reading it as that.

a
b
�
d

If it's searching for "c" and comes to the line starting with � then it would check if c comes before � or after and goes to incorrect direction (up instead of down let's say) because the file is sorted regarding utf-8.

I want to make sure the file wouldn't give me UnicodeDecodeError because all the characters that can give that error is changed by "?" then sorted.

No matter what I tried it's always having that weird characters.

How can I do that?

Python3 UnicodeDecodeError on utf8

Answers (1)

Related Questions