Prosto Trader
Prosto Trader

Reputation: 3527

Imap php attachement file name encoding

I'm using imap php library to extract emails and save attachements.

When I wat to get attachements, I use function

$partStruct = imap_bodystruct($imap, $mailNum, $partNum);

It supposed to have name of the file in parameters attribute, but here is what I have in this attribute:

(
    [type] => 3
    [encoding] => 3
    [ifsubtype] => 1
    [subtype] => VND.OPENXMLFORMATS-OFFICEDOCUMENT.SPREADSHEETML.SHEET
    [ifdescription] => 0
    [ifid] => 0
    [bytes] => 53308
    [ifdisposition] => 1
    [disposition] => ATTACHMENT
    [ifdparameters] => 0
    [ifparameters] => 1
    [parameters] => Array
        (
            [0] => stdClass Object
                (
                    [attribute] => NAME
                    [value] => =?KOI8-R?B?4snUy8/JztkueGxzeA==?=
                )

        )

)

As I can see, it's an xlsx file, but name of the file is =?KOI8-R?B?4snUy8/JztkueGxzeA==?=

Has anyone seen that before? How do I get original utf-8 file name?

Email was sent from Imac and filename originally was in russian. I can try to decode the name, stripping =?KOI8-R?B? stuff, but it look like some kind of standard? What standard is it?

Upvotes: 0

Views: 2210

Answers (2)

Sammitch
Sammitch

Reputation: 32272

http://ncona.com/2011/06/using-utf-8-characters-on-an-e-mail-subject/
https://www.ietf.org/rfc/rfc1342.txt

So for: =?KOI8-R?B?4snUy8/JztkueGxzeA==?=

  • =? and ?= are the beginning/ending delimiters.
  • KOI8-R is the charset
  • B is for Base64 encoding, Q would denote quoted-printable encoding
  • 4snUy8/JztkueGxzeA== Is the encoded filename.

Upvotes: 1

Prosto Trader
Prosto Trader

Reputation: 3527

Well, I've figured out that there is base64 encoded filename.

Here is how I'va managed to get it out, but I'm not sure it will work next time :)

$str = '=?KOI8-R?B?4snUy8/JztkueGxzeA==?=';

//Get parts of the string (idonno how it is formed, but still)
$arrStr = explode('?', $str);

//second part of array should be an encoding name (KOI8-R) in my case
if (isset($arrStr[1]) && in_array($arrStr[1], mb_list_encodings())) {

    switch ($arrStr[2]) {

        case 'B': //base64 encoded
            $str = base64_decode($arrStr[3]);
            break;

        case 'Q': //quoted printable encoded
            $str = quoted_printable_decode($arrStr[3]);
            break;

    }

    //convert it to UTF-8
    $str = iconv($arrStr[1], 'UTF-8', $str);
}


echo $str; //биткоины.xlsx

Any comments on why string should look like that (with all those = and ? and B) are welcome.

It is deffinetly some kind of standard, because linkedIn uses the same to encode russian names but which standard is it?

Upvotes: 4

Related Questions