Reputation: 7949
Cassandra doesn't have some CQL like like clause.... in MySQL
to search a more specific data in database.
I have looked through some data and came up some ideas
1.Using Hadoop
2.Using MySQL server to be my anther database server
But is there any ways I can improve my Cassandra DB performance easier?
Upvotes: 1
Views: 207
Reputation: 10216
Improving your Cassandra DB performance can be done in many ways, but I feel like you need to query the data efficiently which has nothing to do with performance tweaks on the db itself.
As you know, Cassandra is a nosql database, which means when dealing with it, you are sacrificing flexibility of queries for fast read/writes and scalability and fault tolerance. That means querying the data is slightly harder. There are many patterns which can help you query the data:
Know what you are needing in advance. As querying with CQL is slightly less flexible than what you could find in a RDBMS engine, you can take advantage of the fast read-writes and save the data you want to query in the proper format by duplicating it. Too complex?
Imagine you have a user entity that looks like that:
{
"pk" : "someTimeUUID",
"name": "someName",
"address": "address",
"birthDate": "someBirthDate"
}
If you persist the user like that, you will get a sorted list of users in the order they joined your db (you persisted them). Let's assume you want to get the same list of users, but only of those who are named "John". It is possible to do that with CQL but slightly inefficient. What you could do here to amend this problem is to de-normalize your data by duplicating it in order to fit the query you are going to execute over it. You can read more about this here:
http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model
However, this approach seems ok for simple queries, but for complex queries it is somewhat hard to achieve and also, if you are unsure what you are going to query in advance, there is no way you store the data in the proper manner beforehand.
Hadoop comes to the rescue. As you know, you can use hadoop's map reduce to solve tasks involving a large amount of data, and Cassandra data, by my experience, can become very very large. With hadoop, to solve the above example, you would iterate over the data as it is, in each map method to find if the user is named John, if so, write to context.
Here is how the pseudocode would look:
map<data> {
if ("John".equals(data.getColumn("name")){
context.write(data);
}
}
At the end of the map method, you would end up with a list of all users who are named John. Youl could put a time range (range slice) on the data you feed to hadoop which will give you all the users who joined your database over a certain period and are named John. As you see, here you are left with a lot more flexibility and you can do virtually anything. If the data you got was small enough, you could put it in some RDBMS as summary data or cache it somewhere so further queries for the same data can easily retrieve it. You can read more about hadoop in here:
Upvotes: 1