Operation timed out error in Cassandra cluster

Question

My cluster size is 6 machines and I often times receive this error message and I don't really know how to solve this:

code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'LOCAL_ONE'}

This is my complete code and the part of the code where the error message occurs is this:

batch.add(schedule_remove_stmt, (source, type, row['scheduled_for'],row['id']));session.execute(batch,30)

Complete code:

cluster = Cluster(['localhost'])
session = cluster.connect('keyspace')
d = datetime.utcnow()
scheduled_for = d.replace(second=0, microsecond=0)
rowid=[]
stmt = session.prepare('SELECT * FROM schedules WHERE source=? AND type= ? AND scheduled_for = ?')
schedule_remove_stmt = session.prepare("DELETE FROM schedules WHERE source = ? AND type = ? AND scheduled_for = ? AND id = ?")
schedule_insert_stmt = session.prepare("INSERT INTO schedules(source, type, scheduled_for, id) VALUES (?, ?, ?, ?)")
schedules_to_delete = []
articles={}
source=''
type=''
try:
    rows = session.execute(stmt, [source,type, scheduled_for])
    article_schedule_delete = ''
    for row in rows:
        schedules_to_delete.append({'id':row.id,'scheduled_for':row.scheduled_for})
        article_schedule_delete=article_schedule_delete+'\''+row.id+'\','
        rowid.append(row.id)
    article_schedule_delete = article_schedule_delete[0:-1]
    cql = 'SELECT * FROM articles WHERE id in (%s)' % article_schedule_delete
    articles_row = session.execute(cql)
    for row in articles_row:
        articles[row.id]=row.created_at
except Exception as e:
    print e
    log.info('select error is:%s' % e)
try:
    for row in schedules_to_delete:
        batch = BatchStatement()
        batch.add(schedule_remove_stmt, (source, type, row['scheduled_for'],row['id']))
        try:
            if row['id'] in articles.keys():
                next_schedule =d
                elapsed = datetime.utcnow() - articles[row['id']]
                if elapsed <= timedelta(hours=1):
                    next_schedule += timedelta(minutes=6)
                elif elapsed <= timedelta(hours=3):
                    next_schedule += timedelta(minutes=18)
                elif elapsed <= timedelta(hours=6):
                    next_schedule += timedelta(minutes=36)
                elif elapsed <= timedelta(hours=12):
                    next_schedule += timedelta(minutes=72)
                elif elapsed <= timedelta(days=1):
                    next_schedule += timedelta(minutes=144)
                elif elapsed <= timedelta(days=3):
                    next_schedule += timedelta(minutes=432)
                elif elapsed <= timedelta(days=30) :
                    next_schedule += timedelta(minutes=1440)
                if not next_schedule==d:
                    batch.add(schedule_insert_stmt, (source,type, next_schedule.replace(second=0, microsecond=0),row['id']))
                    #log.info('schedule id:%s' % row['id'])
        except Exception as e:
            print 'key error:',e
            log.info('HOW IT CHANGES %s %s %s %s ERROR:%s' % (source,type, next_schedule.replace(second=0, microsecond=0), row['id'],e))
        session.execute(batch,30)
except Exception as e:
    print 'schedules error is =======================>',e
    log.info('schedules error is:%s' % e)

Thanks a lot for the help I really don't know how to solve this!

Mikhail Baksheev · Accepted Answer

I think you shouldn't use a batch statement in this case because you are tying to use the batch to perform a big number of operations for different partition keys, it leads to timeout exceptions. You should use batches to keep tables in sync but not for performance optimization. You can find more about misusing batches in this article

Using an asynchronous driver api is more suitable to perform a lot of delete queries for you case. It will allow to keep performance of your code and avoid coordinator overload.

Operation timed out error in Cassandra cluster

Answers (1)

Related Questions