JVXR
JVXR

Reputation: 1312

Concurrent email processing (without spamming)

I have a scenario where I need to process a csv file that contains some simulation data from a device. Each line is an output representing the device state at a point in time. On processing each line, specific columns are checked for variance / anomalies. If there are anomalies, an email has to be sent to a bunch of folks with the detected anomaly. However to avoid spamming them (csv can occasionally be several 100 thousand lines) I have to maintain a threshold of X seconds.i.e If a mail was sent for the same anomaly from the same condition (from the same device being simulated) < X seconds back, I must just ignore sending the mail.

Currently the solution I use seems clumsy to me, where

1) I save the mail-message and device id with anomaly detection time.

2) Create one "alert" per email-id with a create-time-stamp, sent-time-stamp, message-id (from step 1) and device-id with status as "NEW".

3) Before sending each mail I do a database read to see if the last email with status as 'SENT' has a time stamp that exceeds the threshold to ignore. ( now - sent-time-stamp > threshold) If yes, then I get all the alerts using the message-id and send them out and update all their status to SENT- else just ignore.

I started off with a thread pool executor and realized halfway through that the read-send condition can fail once there are multiple threads trying to send out emails and update the sent-time-stamp. So for now I have set the thread pool size to 1 - which beats the purpose of an executor. (I don't have row level locking as I use Mongo as the backing db). The backing datastore has to be a nosql store as the fields can vary drastically and will not fit a machine's disk as more simulations get piped in.

The application is distributed - so a csv file can be picked by any random node to process and notify.

Would Akka be a good candidate for this kind of process ? Any insights or lessons from prior experience implementing this are welcome (I have to stick with JVM).

Upvotes: 4

Views: 104

Answers (2)

pushy
pushy

Reputation: 9635

Akka could help you with the distribution if you use Akka Cluster. That gives you a dynamic peer-to-peer cluster on your nodes, very nice if you need it. FApart from that, Akka works message-based which sounds like a good match to model your domain.

However, be aware that Akka bases on the actor programming model, which is great but really different from multi-threaded programs in java. So there is a learning curve. If you need a quick solution, it will probably not be the best match. If you are willing to put some time into this and learn what Akka is about, it could be a good match.

Upvotes: 1

Alex Chernyshev
Alex Chernyshev

Reputation: 1745

You can use distributed Akka as replacement (see good tutorial here http://www.addthis.com/blog/2013/04/16/building-a-distributed-system-with-akka-remote-actors/#.U-HWzvmSzy4) but why? Just bit update what already works:

1) Remove Executor at all, it's not needed here, send emails one by one (I suppose you're not trying to send millions of mail messages at once, right?)

2) Cleanup database for old messages on application start to resolve problems with disk space.

Upvotes: 1

Related Questions