Redth
Redth

Reputation: 5544

Solandra to replace our Lucene + RDBMS?

Currently we are using a combination of SQL Server and Lucene to index some relational data about domain names. We have a Domain table, and about 10 other various other tables for histories of different metrics we calculate and store about domains. For example:

Domain

SeoScore

We are trying to include all the domains from major zone files in our database, so we are looking at about 600 million records eventually, which seems like it's going to be a bit of a chore to scale in SQL Server. Given our reliance on Lucene to do some pretty advanced queries, Solandra seems like it may be a nice fit. I am having a hard time not thinking about our data in relational database terms.

The SeoScore table would map one to many Domains (one record for each time we calculated the score). I'm thinking that in Solandra terms, the best way to achieve this would be use two indexes, one for Domain and one for SeoScore.

Here are the querying scenarios we need to achieve:

  1. A 'current snapshot' of the latest metrics for each domain (so the latest SeoScore for a given domain. I'm assuming we would find the Domain records we want first, and then run further queries to get the latest snapshot of each metric separately.

  2. Domains with SeoScores not having been checked since x datetime, and having IsTracked=1, so we would know which ones need to be recalculated. We would need some sort of batching system here so we could 'check out' domains and run calculations on them without duplicating efforts.

Am I way off track here? Would we be right in basically mapping our tables to separate indexes in solandra in this case?

UPDATE

Here's some JSON notation of what I'm thinking:

Domains : { //Index
    domain1.com : { //Document ID
        Middle : "domain1", //Field
        Extension : "com",
        Created : '2011-01-01 01:01:01.000',
        ContainsDashes : false,
        ContainsNumbers : false,
        IsIDNA : false,
    },
    domain2.com {
        ...
    }
}

SeoScores : { //Index
    domain1.com { //Document ID
        '2011-02-01 01:01:01.000' : { 
            SeoScore: 3 
        },
        '2011-01-01 01:01:01.000' : {
            SeoScore: -1
        }
    },
    domain2.com {
        ...
    }
}

Upvotes: 2

Views: 432

Answers (1)

tjake
tjake

Reputation: 506

For SeoScores you might want to consider using virtual cores:

https://github.com/tjake/Solandra/wiki/ManagingCores

This lets you partition the data by domain so you can have SeoScores.domain1 and make each document the represent one timestamp.

The rest sounds fine.

Upvotes: 2

Related Questions