Reputation: 77
The following error occurs when downloading that file, compressed into a single zip file.
invalid byte sequence in UTF-8
.
For this error, I have to remove illegal characters as UTF-8 from the string, so I used encode method to convert from UTF-8 to UTF-8, but the string I want to display is not displayed. It looks like the image.
file_name.encode!("UTF-8", "UTF-8", invalid: :replace)
Is there any solution to this problem?
I would be glad to know.
Zip::File.open_buffer(obj) do |zip|
zip.each do |entry|
ext = File.extname(entry.name)
file_name = File.basename(entry.name)
# file_name.encode!("UTF-8", "UTF-8", invalid: :replace)
next if ext.blank? || file_name.count(".") > 1
dir = File.join(dir_name, File.dirname(entry.name))
FileUtils.mkpath(dir.to_s)
zip.extract(entry, dir + ".txt" || ".jpg" || ".csv") {true}
file_name.force_encoding("UTF-8")
new_file_name = "#{dir_name}/#{file_name}"
new_file_name.force_encoding("UTF-8")
File.rename(dir + ".txt" || ".jpg" || ".csv", new_file_name)
@input_dir << new_file_name
end
end
Zip::OutputStream.open(zip_file.path) do |zip_data|
@input_dir.each do |file|
zip_data.put_next_entry(file)
zip_data.write(File.read(file.to_s))
end
end
mac OS Catarina 10.15.7 ruby "2.6.3"
Upvotes: 1
Views: 2625
Reputation: 114138
You get these errors because the Zip gem assumes the filenames to be encoded in UTF-8 but they are actually in a different encoding.
To fix the error, you first have to find the correct encoding. Let's re-create the string from its bytes:
bytes = [111, 117, 116, 112, 117, 116, 50, 48, 50, 48, 49,
50, 48, 55, 95, 49, 52, 49, 54, 48, 50, 47, 87,
78, 83, 95, 85, 80, 151, 112, 131, 102, 129, 91,
131, 94, 46, 116, 120, 116]
string = bytes.pack('c*')
#=> "output20201207_141602/WNS_UP\x97p\x83f\x81[\x83^.txt"
We can now traverse the Encoding.list
and select
those that return the expected result:
Encoding.list.select do |enc|
s = string.encode('UTF-8', enc) rescue next
s.end_with?('WNS_UP用データ.txt')
end
#=> [
# #<Encoding:Windows-31J>,
# #<Encoding:Shift_JIS>,
# #<Encoding:SJIS-DoCoMo>,
# #<Encoding:SJIS-KDDI>,
# #<Encoding:SJIS-SoftBank>
# ]
All of the above encodings result in the correct output.
Back to your code, you could use:
path = entry.name.encode('UTF-8', 'Windows-31J')
#=> "output20201207_141602/WNS_UP用データ.txt"
ext = File.extname(path)
#=> ".txt"
file_name = File.basename(path)
#=> "WNS_UP用データ.txt"
The Zip gem also has an option to set an explicit encoding for non-ASCII file names. You might want to give it a try by setting Zip.force_entry_names_encoding = 'Windows-31J'
(haven't tried it)
Upvotes: 2