Cjoerg
Cjoerg

Reputation: 1325

How best to encode or clean up the email body when collecting mails through Ruby Net::IMAP

I am collecting emails from an IMAP server with the code beneath, but the content of the email body is often very ugly, and sometimes impossible to understand. Many of the emails contains Danish and Swedish special characters like e.g. æ, ä, ö, ø and å, but I don't think that is the problem. How best to encode and clean up?

imap = Net::IMAP.new(address, port, enable_ssl?)
imap.login(user_name, password)
imap.examine(flag)

search_query = "#{last_uid}:*"

imap.uid_search(search_query).each do |uid|
  if uid.to_i > last_uid.to_i

    header = imap.uid_fetch(uid, "BODY[HEADER.FIELDS (FROM TO DATE SUBJECT)]")[0].attr["BODY[HEADER.FIELDS (FROM TO DATE SUBJECT)]"]
    from = Mail.read_from_string(header).from.first
    to = Mail.read_from_string(header).to.first rescue nil
    subject = Mail.read_from_string(header).subject
    date = Mail.read_from_string(header).date

    body = imap.uid_fetch(uid, "BODY[TEXT]")[0].attr["BODY[TEXT]"].gsub(/\r\n?/, "\n").force_encoding('UTF-8')

  end
end
imap.logout()
imap.disconnect()

Sample body content:

1:

LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS08YnI+DQpPcmRy
ZWRhdG86IDI4LTAzLTIwMTMgMTQ6NDc6MTg8YnI+DQpPcmRyZW51bW1lcjogMTA5MDM1PGJy
Pg0KVHJhbnNha3Rpb25zSUQ6IDE2NzgyMQ0KPGJyPjxicj4NCkZha3R1cmVyaW5nc2FkcmVz
c2U6PGJyPg0KLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLTxicj48YnI+DQpOaWtsYXMgSnV1bCBOaWVs
c2VuPGJyIC8+QS5QLiBNw7hsbGVyIEtvbGxlZ2lldCAxMDU8YnIgLz41NzAwIFN2ZW5kYm9y
ZzxiciAvPkRlbm1hcms8YnIgLz5UTEY6OiAyMDYzMDczNzxiciAvPjxhIGhyZWY9Im1haWx0
bzpuaWtzQGxpdmUuZGsiPm5pa3NAbGl2ZS5kazwvYT48YnIgLz4NCjxicj48YnI+DQpMZXZl
cmluZ3NhZHJlc3NlOjxicj4NCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS08YnI+PGJyPg0KTmlrbGFz
IEp1dWwgTmllbHNlbjxiciAvPkEuUC4gTcO4bGxlciBLb2xsZWdpZXQgMTA1PGJyIC8+NTcw
MCBTdmVuZGJvcmc8YnIgLz5EZW5tYXJrPGJyIC8+VExGOjogMjA2MzA3Mzc8YnIgLz48YSBo
cmVmPSJtYWlsdG86bmlrc0BsaXZlLmRrIj5uaWtzQGxpdmUuZGs8L2E+PGJyIC8+DQo8YnI+
PGJyPg0KT3JkcmVkYXRhOjxicj4NCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS08YnI+DQoNCiAgMSww
MCBzdGsuIFN0YXIgV2FycyBCYXR0bGVmcm9udCBJSSBYYm94ICg0MTAzMikgw6EgREtLIDI2
Myw5OSAtIElhbHQ6IERLSyAzMjksOTkNCjxicj4NCjxicj4NCkJldGFsaW5nOiAyOiBEYW5z
a2Uga3JlZGl0a29ydCBbdHJhbnNha3Rpb25zZ2VieXIgMSwyNSVdIChES0sgNCwxMykNCjxi
cj4NCkZvcnNlbmRlbHNlOiAgKERLSyAwLDAwKQ0KPGJyPjxicj4NClNhbWxldCBwcmlzIDog
REtLIDMzNCwxMg0KPGJyPg0KSGVyYWYgbW9tczogREtLIDY2LDgzDQo=

2 (shortened):

------=_NextPart_000_0482_01CE2B9E.A689A9F0
Content-Type: multipart/related;
    boundary="----=_NextPart_001_0483_01CE2B9E.A689A9F0"

------=_NextPart_001_0483_01CE2B9E.A689A9F0
Content-Type: multipart/alternative;
boundary="----=_NextPart_002_0484_01CE2B9E.A689A9F0"

------=_NextPart_002_0484_01CE2B9E.A689A9F0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

=20



            =09
    =09
    =09
=09
=20

=09

                    =09
                        =09
                        =09
                        =09
                        =09
                        =09
                        =09
                        =09

Daily Restock Information.

=09
=09

Item

Format

1+=20

 5+ =20

 Box Price=20

Qty

Barcode

=09

3 (shortened):

--Boundary-=_SHccxHuUYYhTGDGLfcIEBDUToEun
Content-Type: text/plain; charset="ISO-8859-1"


--Boundary-=_SHccxHuUYYhTGDGLfcIEBDUToEun
Content-Type: application/octet-stream
Content-Disposition: attachment; filename="SYSTEMSTOCK.XLSX"
Content-Transfer-Encoding: base64

UEsDBBQABgAIAAAAIQC5OlcVkgEAAIwGAAATAN0BW0NvbnRlbnRfVHlwZXNdLnhtbCCi2QEooAAC
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAMRVyWrDMBC9F/oPRtcSK0mhlBInhy7HNpD0AxRrEovYktBMtr/v2FloimtI
HejF+7xl9EYejLZFHq0hoHE2Eb24KyKwqdPGLhLxOX3rPIoISVmtcmchETtAMRre3gymOw8YcbXF
RGRE/klKTDMoFMbOg+U3cxcKRXwbFtKrdKkWIPvd7oNMnSWw1KESQwwHLzBXq5yi1y0/3iuZGSui
5/13JVUilPe5SRWxULm2+gdJx83nJgXt0lXB0DH6AEpjBkBFHvtgmDFMgIiNoZDDwQebDkZDNFaB

etc..

Upvotes: 4

Views: 1212

Answers (1)

Carpela
Carpela

Reputation: 2195

Dug around for many hours trying to solve this problem so adding my answer to a few of the threads I found...

https://stackoverflow.com/a/26604049/2386548

Hope that helps somebody...

Upvotes: 2

Related Questions