Reputation: 3527
I'm using imap php library to extract emails and save attachements.
When I wat to get attachements, I use function
$partStruct = imap_bodystruct($imap, $mailNum, $partNum);
It supposed to have name of the file in parameters
attribute, but here is what I have in this attribute:
(
[type] => 3
[encoding] => 3
[ifsubtype] => 1
[subtype] => VND.OPENXMLFORMATS-OFFICEDOCUMENT.SPREADSHEETML.SHEET
[ifdescription] => 0
[ifid] => 0
[bytes] => 53308
[ifdisposition] => 1
[disposition] => ATTACHMENT
[ifdparameters] => 0
[ifparameters] => 1
[parameters] => Array
(
[0] => stdClass Object
(
[attribute] => NAME
[value] => =?KOI8-R?B?4snUy8/JztkueGxzeA==?=
)
)
)
As I can see, it's an xlsx file, but name of the file is =?KOI8-R?B?4snUy8/JztkueGxzeA==?=
Has anyone seen that before? How do I get original utf-8 file name?
Email was sent from Imac and filename originally was in russian. I can try to decode the name, stripping =?KOI8-R?B?
stuff, but it look like some kind of standard? What standard is it?
Upvotes: 0
Views: 2210
Reputation: 32272
http://ncona.com/2011/06/using-utf-8-characters-on-an-e-mail-subject/
https://www.ietf.org/rfc/rfc1342.txt
So for: =?KOI8-R?B?4snUy8/JztkueGxzeA==?=
=?
and ?=
are the beginning/ending delimiters.KOI8-R
is the charsetB
is for Base64 encoding, Q
would denote quoted-printable encoding4snUy8/JztkueGxzeA==
Is the encoded filename.Upvotes: 1
Reputation: 3527
Well, I've figured out that there is base64 encoded filename.
Here is how I'va managed to get it out, but I'm not sure it will work next time :)
$str = '=?KOI8-R?B?4snUy8/JztkueGxzeA==?=';
//Get parts of the string (idonno how it is formed, but still)
$arrStr = explode('?', $str);
//second part of array should be an encoding name (KOI8-R) in my case
if (isset($arrStr[1]) && in_array($arrStr[1], mb_list_encodings())) {
switch ($arrStr[2]) {
case 'B': //base64 encoded
$str = base64_decode($arrStr[3]);
break;
case 'Q': //quoted printable encoded
$str = quoted_printable_decode($arrStr[3]);
break;
}
//convert it to UTF-8
$str = iconv($arrStr[1], 'UTF-8', $str);
}
echo $str; //биткоины.xlsx
Any comments on why string should look like that (with all those =
and ?
and B
) are welcome.
It is deffinetly some kind of standard, because linkedIn uses the same to encode russian names but which standard is it?
Upvotes: 4