Amr Sharaf
Amr Sharaf

Reputation: 86

Scalable automatic email classification service

We're currently working on an application that enable the user to register with one or more email account so that his emails could be automatically classified. The front-end has been implement using Ruby, however, the back-end (email classifier) is written in java and uses the WEKA API. The question is how could we integrate the front-end (Web interface written in Ruby) with the back-end (email classifier written in java) in a scalable way (handling large number of users simultaneously..

Upvotes: 3

Views: 510

Answers (3)

David Weiser
David Weiser

Reputation: 5195

As the amount of data you're using to train the classifier with grows, you may find that you might want to use ensemble algorithms (where a group of n nodes form the ensemble) and split the training data up over each of the n nodes.

To classify a new datapoint, you can use a voting system where each of the n nodes gets to "vote" on what the new datapoint should be classified as. The classification with the most votes wins.

Upvotes: 0

Richard Wise
Richard Wise

Reputation: 1

If you want a new email alert just reverse which RESTful API you are exposing. Instead of exposing the Java app as a RESTful API, expose the Rails app API. For example /user/ID/newmail.

The Java app would then call the Rails app when a new email arrives.

Btw:

How did you implement a scalable system in Java for checking 1000s of email accounts?

Upvotes: 0

Amir Raminfar
Amir Raminfar

Reputation: 34169

I am not sure what is an email classifier. But in any similar problem, the best solution I recommend creating a RESTful API for your java service. This can be done very elegantly with the right tools. The API should be over HTTP which returns JSON. Use a library like Jackson that serialize to JSON.

On the ruby side, you an easily parse that JSON and deserialize.

This is a very scalable solution because HTTP calls are stateless and already scalable. Thread is used and thrown away. If you need more power, then just add more machines.

The Rails app can also start caching some calls. But that is premature optimization.

If there is no logic and only a common database, then just share that common database between the two apps. But it sounds like the Java app needs to do some work. This is a common approach with APIs. It also doesn't limit you to Ruby. You can create JSONP service for AJAX or any other client that can understand JSON.

Upvotes: 1

Related Questions