fasouto
fasouto

Reputation: 4511

Cassandra low performance?

I have to choose Cassandra or MongoDB(or another nosql database, I accept suggestions) for a project with a lot of inserts(1M/day). So I create a small test to measure the write performance. Here's the code to insert in Cassandra:

import time
import os
import random
import string
import pycassa

def get_random_string(string_length):
    return ''.join(random.choice(string.letters) for i in xrange(string_length))

def connect():
    """Connect to a test database"""
    connection = pycassa.connect('test_keyspace', ['localhost:9160'])
    db = pycassa.ColumnFamily(connection,'foo')
    return db

def random_insert(db):
    """Insert a record into the database. The record has the following format
    ID timestamp
    4 random strings
    3 random integers"""
    record = {}
    record['id'] = str(time.time())
    record['str1'] = get_random_string(64)
    record['str2'] = get_random_string(64)
    record['str3'] = get_random_string(64)
    record['str4'] = get_random_string(64)
    record['num1'] = str(random.randint(0, 100))
    record['num2'] = str(random.randint(0, 1000))
    record['num3'] = str(random.randint(0, 10000))
    db.insert(str(time.time()), record)

if __name__ == "__main__":
    db = connect()
    start_time = time.time()
    for i in range(1000000):
        random_insert(db)
    end_time = time.time()
    print "Insert time: %lf " %(end_time - start_time)

And the code to insert in Mongo it's the same changing the connection function:

def connect():
    """Connect to a test database"""
    connection = pymongo.Connection('localhost', 27017)
    db = connection.test_insert
    return db.foo2

The results are ~1046 seconds to insert in Cassandra, and ~437 to finish in Mongo. It's supposed that Cassandra it's much faster than Mongo inserting data. So , What i'm doing wrong?

Upvotes: 8

Views: 3826

Answers (5)

warvariuc
warvariuc

Reputation: 59604

Create batch mutator for doing multiple insert, update, and remove operations using as few roundtrips as possible.

http://pycassa.github.com/pycassa/api/pycassa/columnfamily.html#pycassa.columnfamily.ColumnFamily.batch

Batch mutator helped me reduce insert time in at least half

Upvotes: 1

Perry Krug
Perry Krug

Reputation: 104

Might I suggest taking a look at Membase here? It's used in exactly the same way as memcached and is fully distributed so you can continuously scale your write input rate simply by adding more servers and/or more RAM.

For this case, you'll definitely want to go with a client-side Moxi to give you the best performance. Take a look at our wiki: wiki.membase.org for examples and let me know if you need any further instruction...I'm happy to walk you through it and I'm certain that Membase can handle this load easily.

Upvotes: 1

Ravindra
Ravindra

Reputation: 353

You will harness true power of Cassandra once you have multiple nodes running. Any node will be able to take a write request. Multithreading a client is only flooding more requests to same instance which is not going to help after a point.

  • Check cassandra log for the events that happen during your tests. Cassandra will initiate a disk write once the Memtable is full (this is configurable, make it large enough and you will be dealing on in RAM + disk writes of commit log). If disk write for Memtable happen during your test then it will slow it down. I do not know when MongoDB writes to disk.

Upvotes: 2

jbellis
jbellis

Reputation: 19377

There is no equivalent to Mongo's unsafe mode in Cassandra. (We used to have one, but we took it out, because it's just a Bad Idea.)

The other main problem is that you're doing single-threaded inserts. Cassandra is designed for high concurrency; you need to use a multithreaded test. See the graph at the bottom of http://spyced.blogspot.com/2010/01/cassandra-05.html (actual numbers are over a year out of date but the principle is still true).

The Cassandra source distribution has such a test included in contrib/stress.

Upvotes: 12

Bryan Migliorisi
Bryan Migliorisi

Reputation: 9210

If I am not mistaken, Cassandra allows you to specify whether or not you are doing a MongoDB-equivalent "safe mode" insert. (I dont recall the name of that feature in Cassandra)

In other words, Cassandra may be configured to write to disk and then return as opposed to the default MongoDB configuration which immediately returns after performing an insert without knowing if the insert was successful or not. It just means that your application never waits for a pass\fail from the server.

You can change that behavior by using safe mode in MongoDB but this is known to have a large impact on performance. Enable safe mode and you may see different results.

Upvotes: 4

Related Questions