Reputation: 1628
Users of our application are able to upload plain text files. These files might then be added as attachments to outgoing ActionMailer
emails. Recently an attempt to send said email resulted in an invalid byte sequence in UTF-8
error. The email was not sent. This symbol, �, appears throughout the offending attachment.
We're using ActionMailer
so although it ought to go without saying, here's representative code for the attachment action within the mailer class's method:
attachments['file-name.jpg'] = File.read('file-name.jpg')
From a business standpoint we don't care about the content of these text files. Ideally I'd like for our application to ignore the content and simply attach them to emails.
Is it possible to somehow tell Rails
/ ActionMailer
to ignore the formatting? Or should I parse the incoming text file, stripping out non-UTF-8 characters?
I did search through like questions here on Stack Overflow but nothing addressed the problem I'm currently facing.
Edit: I did call #readlines
on the file in a Rails console and found that the black diamond is a representation of \xA0
. This is likely a non-breaking space in Latin1 (ISO 8859-1).
Upvotes: 1
Views: 1775
Reputation: 1628
When reading the file at time of attachment, I can use the following syntax.
mail.attachments[file.file_name.to_s] = File.read(path_to_file).force_encoding("BINARY").gsub(0xA0.chr,"")
The important addition is the following, which goes after the call to File.read(...)
:
.force_encoding("BINARY").gsub(0xA0.chr,"")
The stripping and encoding ought to be done at time of file upload to our system, so this answer isn't the resolution. It's a short-term band-aid.
Upvotes: 0
Reputation: 8345
With your edit, this seems pretty clear to me:
File.read
will associate the string with utf-8 encoding. You can verify this by logging the value of str.encoding
(where str
is the value of File.read
).File.read
does not actually verify the encoding, it only slurps in the bytes and slaps on the encoding (like force_encoding
).If your text files are encoded in latin1, then use File.read(path, encoding: Encoding::ISO_8859_1)
. This way, it may work. Let us know if it doesn't...
Upvotes: 0
Reputation: 4383
If Ruby is having problems reading the file and corrupting the characters during the read then try using File.binread
. File.binread
is inherited from IO
...
attachments['attachment.txt'] = File.binread('/path/to/file')
...
If your file already has corrupted characters then you can either find some process to 'uncorrupt' them, which is not fun, or strip them using by re-encoding from ASCII-8bit
to UTF-8
stripping out the invalid characters.
...
attachments['attachment.txt'] = File.binread('/path/to/file')
.encode('utf-8', 'binary', invalid: :replace, undef: :replace)
...
(String#scrub
does this but since you can't read it in as UTF-8
then you cant use it.)
Upvotes: 2