Reputation: 1779
I'm trying to work out how best to connect / thread a chain of emails. This seems like such a common problem that I was surprised that I couldn't easily locate information on how other people have dealt with it. The only thing I found was a post about JWZ threading which looked more concerned with parsing together a thread in one email. I was wondering if anyone could point to me some current solutions.
I'm using the thoughtbot griddler gem to process incoming emails into a model Message(s)
and a separate model Contact(s)
, and I have a third model for storing replies, e.g. Reply
.
My current thinking is to thread them by the unique contact and the subject line. But then again the subject line will change slightly. e.g. from "This subject" -> "Re: re: This subject" I could use regex to try parsing out "re:"s or I could use something like amatch to do string comparisons?
But then again, what to do about the same subject appearing for the same user 2 months later? Also add some logic regarding the current date so that threads only use recent emails. Then there might be something else useful stored in the email header itself?
I have i rough idea of how to do it, I'm just curious to see some current implementations, I just can't seem to find any.
Any pointers would be greatly appreciated!
Upvotes: 18
Views: 1337
Reputation: 7631
There is a new gem named Msgthr, which is an implementation JWZ's algorithm. It's not matching subjects, senders or dates, so it's not exactly what you're looking for, but I think it's a good start.
The neatest thing about Msgthr
is that it's container-agnostic, hence you don't have to install requirements such as TMail
, as in Frederik Dietz's ruby port. This also means it can be used for other types of communications.
Here's some sample code, given a list of messages, let's group them into threads:
thr = Msgthr.new
threads = {}
[1, 11, 12, 2, 21, 211].each{ |id| threads[id] = [id]}
my_add = lambda do |id, refs, msg|
thr.add(id, refs, msg) do |parent, child|
threads[child.mid] = threads[parent.mid]
end
end
# Create the following structure
# 1
# \
# | 1.1
# \
# 1.2
# 2
# \
# 2.1
# \
# 2.1.1
my_add.call(1, nil, '1')
my_add.call(11, [1], '1.1')
my_add.call(12, [1], '1.2')
my_add.call(2, nil, '2')
my_add.call(21, [2], '2.1')
my_add.call(211, [21], '2.1.1')
thr.thread!
thr.rootset.each do |cnt|
threads[cnt.mid][0] = cnt.msg
end
Disclosure: I'm one of the contributors to the gem.
Upvotes: 0
Reputation: 1615
Email threads are a linked list, the information in the headers contains enough information to reconstruct the list from its component parts.
Introspect the email headers and to look for some specific headers.
The key ones you'll use are Message-ID
, In-Reply-To
and References
. These headers give you information about which message was replied to and what other ids matter to the email thread itself.
The easiest way to find information about the headers of an email is to open the 'Original Message' in gmail (from the more menu).
Upvotes: 7