Avery Payne
Avery Payne

Reputation: 1748

Documentation on Apple Mail's .emlx data structure(s) (for conversion purposes)?

This appears to be a rare gem: where to find documentation on the structure of Apple Mail's .emlx files (and their partial variants, and the meaning of the directory structures). The docs do not appear to exist on Apple's site, nor can I find any reasonable mention of it via Google.

The point of this is the creation of a bash/ruby/python/insert-script-langauge-here script to convert a mess of these files into something usable/pliable, like Maildir or Mbox. The ultimate goal is to migrate a snapshot of a user's /Library/Mail store into an existing Dovecot setup, which uses a form of Maildir.

Yes, I am aware of this program but it does not address the solution I am after. Converting 20 mailboxes by hand and manually inserting them into an existing installation will require more hours than just writing a script that digests the messages into something else and then automatically storing them where they should be. Nevermind that there are potentially a half-dozen more users that will require this procedure. So it's worth my time to script it up.

Please vote to close the duplicate of this question while it is pending deletion, instead of voting for this question to close. For some reason, there are occasional posting glitches when using Chrome as a browser.

FOLLOW-UP: It appears that the format really is undocumented, and that most sources have reverse-engineered it. If I have time I will attempt to do so my self; and if I'm successful, I will post a 2nd follow-up with the details of my findings.

Upvotes: 8

Views: 6166

Answers (5)

Imdat Solak
Imdat Solak

Reputation: 11

The original emlx2mbox ruby script was written a long time ago. I have updated it to run with modern ruby environment. Please check it out on https://github.com/imdatsolak/elmx2mbox

Upvotes: 0

Michael
Michael

Reputation: 717

As of 2020, Python has a leightweight emlx library.

pip install emlx

and then

>>> import emlx
>>> m = emlx.read("12345.emlx")

>>> m.headers
{'Subject': 'Re: Emlx library ✉️',
 'From': 'Michael <[email protected]>',
 'Date': 'Thu, 30 Jan 2020 20:25:43 +0100',
 'Content-Type': 'text/plain; charset=utf-8',
 ...}
>>> m.headers['Subject']
'Re: Emlx library ✉️'

>>> m.plist
{'color': '000000',
 'conversation-id': 12345,
 'date-last-viewed': 1580423184,
 'flags': {...},
 ...}

>>> m.flags
{'read': True, 'answered': True, 'attachment_count': 2}

Upvotes: 2

olekeh
olekeh

Reputation: 537

I am using mailcore2 to parse .eml messages. To make this work with .emlx, I just had to remove the first line (containing a number). The message itself is equipped with the length of the message so the XML block at the end does not need to be removed.

Here is how I did it in objective-c/cocoa (MCOMessageParser comes from the mailcore2 framework):

-(Documents *)ParseEmlMessageforPath: (NSString*)fullpath filename:(NSString*)filename{
NSLog(@"fullpath = %@", fullpath);
NSError * error;
error = nil;
NSData *fileContents = [NSData dataWithContentsOfFile:fullpath options:NSDataReadingMappedIfSafe error:&error];
if (error) { 
     [[NSApplication sharedApplication] presentError:error];
}
MCOMessageParser * parser;
if (fileContents) {
    if ([[fullpath pathExtension] isEqualToString:@"emlx"]) {
        NSData * linefeed = [(NSString*)@"\n" dataUsingEncoding:NSUTF8StringEncoding ];
        NSInteger filelength = [fileContents length];
        NSRange  xx = NSMakeRange(0, 20); 
        NSRange pos = [fileContents rangeOfData:linefeed options:0 range:xx] ;
        if (pos.location != NSNotFound) {
            NSData *subcontent = [fileContents subdataWithRange:(NSRange){pos.location+1, filelength-(pos.location)-1}];
            parser = [MCOMessageParser messageParserWithData:subcontent];
        } else {
            return nil;
        }

    } else {
        parser = [MCOMessageParser messageParserWithData:fileContents];

    }

And there you go....

Upvotes: 1

karlcow
karlcow

Reputation: 6972

A few more information documenting emlx format.

The message is composed:

  • a byte count for the message on the first line
  • a MIME dump of the message
  • an XML plist

The XML plist contains certains code such as

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
        <key>date-sent</key>
        <real>1362211252</real>
        <key>flags</key>
        <integer>8590195713</integer>
        <key>original-mailbox</key>
        <string>imap://****@127.0.0.1:143/mail/2013/03</string>
        <key>remote-id</key>
        <string>252</string>
        <key>subject</key>
        <string>Re: Foobar</string>
</dict>

The flags have been described by jwz and represents a 30 bit integer:

0      read                      1 << 0
1      deleted                   1 << 1
2      answered                  1 << 2
3      encrypted                 1 << 3
4      flagged                   1 << 4
5      recent                    1 << 5
6      draft                     1 << 6
7      initial (no longer used)  1 << 7
8      forwarded                 1 << 8
9      redirected                1 << 9
10-15  attachment count          3F << 10 (6 bits)
16-22  priority level            7F << 16 (7 bits)
23     signed                    1 << 23
24     is junk                   1 << 24
25     is not junk               1 << 25
26-28  font size delta           7 << 26 (3 bits)
29     junk mail level recorded  1 << 29
30     highlight text in toc     1 << 30
31     (unused)

Sending myself a simple message and removing some details, so you can see the full data structure of emlx files.

875       
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on ******.*********.***
X-Spam-Level: 
X-Spam-Status: No, score=-3.2 required=4.2 tests=BAYES_00,RP_MATCHES_RCVD,
        SPF_PASS,TVD_SPACE_RATIO autolearn=ham version=3.3.2
Received: from [127.0.0.1] (******.*********.*** [***.**.**.**])
        by ******.*********.*** (8.14.5/8.14.5) with ESMTP id r2TN8m4U099571
        for <****@*********.***>; Fri, 29 Mar 2013 19:08:48 -0400 (EDT)
        (envelope-from ****@*********.***)
Subject: very simple
From: Karl Dubost <****@*********.***>
Content-Type: text/plain; charset=us-ascii
Message-Id: <4E83618E-BB56-404F-8595-87352648ADC7@*********.***>
Date: Fri, 29 Mar 2013 19:09:06 -0400
To: Karl Dubost <****@*********.***>
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v1283)
X-Mailer: Apple Mail (2.1283)

message Foo
-- 
Karl Dubost
http://www.la-grange.net/karl/
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
        <key>date-sent</key>
        <real>1364598546</real>
        <key>flags</key>
        <integer>8590195713</integer>
        <key>original-mailbox</key>
        <string>imap://********@127.0.0.1:11143/mail/2013/03</string>
        <key>remote-id</key>
        <string>41147</string>
        <key>subject</key>
        <string>very simple</string>
</dict>
</plist>

Upvotes: 4

Matt G
Matt G

Reputation: 2373

Here is an emlx2mbox converter in ruby: Mailbox Converter.

I don't think it was written from any documentation of the spec, but it has undergone multiple updates, so hopefully evolved to handle at least some of the quirks of the format. The source code is about 250 lines long, and it looks readable and well-commented.

Upvotes: 3

Related Questions