Reputation: 162
We are working on a service that will parse a user's email. The following is a example of a raw email with emoji. This Hexadeciaml character in the email =F0=9F=98=97
is an emoji 🙂. I could verified it from here.
Why are the emojis coming this way? and is there any way to parse it in python3?
I found a way to parse it with the help of bytes after manually removing the =
symbols. It works.
bytes.fromhex('F0 9F 99 82').decode('utf-8')
Is there any other way to handle this in Python3?
Thanks in advance
Example of a raw email:
MIME-Version: 1.0
Date: Wed, 22 Sep 2021 18:45:41 +0530
References: <CAFsQotqyCTbnR7ANDZX9oHYtHwtSf-im8pNj6N9pMXytbn+kbw@mail.gmail.com>
In-Reply-To: <CAFsQotqyCTbnR7ANDZX9oHYtHwtSf-im8pNj6N9pMXytbn+kbw@mail.gmail.com>
Message-ID: <CAAqby4THk2GSGbD-RXy5-6bwaEKHuBeWARMjUDRVvi+4_OTFVg@mail.gmail.com>
Subject: Re: wowoww
From: email1 email1 <[email protected]>
To: email2 email2 <[email protected]>
Content-Type: multipart/related; boundary="000000000000a8da4205cc954fc4"
--000000000000a8da4205cc954fc4
Content-Type: multipart/alternative; boundary="000000000000a8da4105cc954fc3"
--000000000000a8da4105cc954fc3
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
AmsterDam
On Tue, Aug 31, 2021 at 12:11 PM email1 email1 <[email protected]>
wrote:
> [image: unarchive.png]
> =F0=9F=99=82=E2=98=BA
> fsdfsdf
> sdf
> ds
> f
> sdf
> ds
> f
>
>
>
--000000000000a8da4105cc954fc3
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">AmsterDam<br></div><br><div class=3D"gmail_quote"><div dir=
=3D"ltr" class=3D"gmail_attr">On Tue, Aug 31, 2021 at 12:11 PM email1 email1=
i <<a href=3D"mailto:[email protected]">[email protected]</a=
>> wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px=
0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><di=
v dir=3D"ltr"><img src=3D"cid:ii_kszpbrxo0" alt=3D"unarchive.png" width=3D"=
566" height=3D"283"><br><div>=F0=9F=99=82=E2=98=BA</div><div>fsdfsdf <br></=
div><div>sdf</div><div>ds</div><div>f</div><div>sdf</div><div>ds</div><div>=
f</div><div><br></div><br></div>
</blockquote></div>
--000000000000a8da4105cc954fc3--
--000000000000a8da4205cc954fc4
Content-Type: image/png; name="unarchive.png"
Content-Disposition: attachment; filename="unarchive.png"
Content-Transfer-Encoding: base64
X-Attachment-Id: ii_kszpbrxo0
Content-ID: <ii_kszpbrxo0>
--000000000000a8da4205cc954fc4--
Upvotes: 0
Views: 335
Reputation: 299495
This is in MIME format, which is very common for email. You'll need to parse the email with a tool like email.parser from the standard library. It will take care of decoding this format into normal strings.
Upvotes: 1