Reputation: 634
I try to use prepared statements as it was described in the official Cassandra and Scylla documentation however performance is still around 30 seconds for 100,000 of messages. Any ideas how can I improve this?
query = "INSERT INTO message (id, message) VALUES (?, ?)"
prepared = session.prepare(query)
for key in range(100000):
try:
session.execute_async(prepared, (0, "my example message"))
except Exception as e:
print("An error occured : " + str(e))
pass
UPDATE
I found information that it is highly recommended to use batches to improve performance so I used prepared statements and batches in accordance to the official documentation. My code at the moment looks in this way:
print("time 0: " + str(datetime.now()))
query = "INSERT INTO message (id, message) VALUES (uuid(), ?)"
prepared = session.prepare(query)
for key in range(100):
print(key)
try:
batch = BatchStatement(consistency_level=ConsistencyLevel.QUORUM)
for key in range(100):
batch.add(prepared, ("example message",))
session.execute(batch)
except Exception as e:
print("An error occured : " + str(e))
pass
print("time 1: " + str(datetime.now()))
Do you have an idea why performance is so slow and after running this source code the result looks like shown below?
test 0: 2018-06-19 11:10:13.990691
0
1
...
41
cAn error occured : Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out for messages.message - received only 1 responses from 2 CL=QUORUM." info={'write_type': 'BATCH', 'required_responses': 2, 'consistency': 'QUORUM', 'received_responses': 1}
42
...
52 An error occured : errors={'....0.3': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=.....0.3
53
An error occured : Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out for messages.message - received only 1 responses from 2 CL=QUORUM." info={'write_type': 'BATCH', 'required_responses': 2, 'consistency': 'QUORUM', 'received_responses': 1}
54
...
59
An error occured : Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out for messages.message - received only 1 responses from 2 CL=QUORUM." info={'write_type': 'BATCH', 'required_responses': 2, 'consistency': 'QUORUM', 'received_responses': 1}
60
61
62
...
69
70
71
An error occured : errors={'.....0.2': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=.....0.2
72
An error occured : errors={'....0.1': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=....0.1
73
74
...
98
99
test 1: 2018-06-19 11:11:03.494957
Upvotes: 0
Views: 2896
Reputation: 21
On my machine I get sub second execution times for this type of issue using a local machine by heavily parallellizing the inserts.
➜ loadz ./loadz
execution time: 951.701622ms
I don't know how to do it in Python I am afraid but in Go it can look like something like this:
package main
import (
"fmt"
"sync"
"time"
"github.com/gocql/gocql"
)
func main() {
cluster := gocql.NewCluster("127.0.0.1")
cluster.Keyspace = "mykeyspace"
session, err := cluster.CreateSession()
if err != nil {
panic(err)
}
defer session.Close()
workers := 1000
ch := make(chan *gocql.Query, 100001)
wg := &sync.WaitGroup{}
wg.Add(workers)
for i := 0; i < workers; i++ {
go func() {
defer wg.Done()
for q := range ch {
if err := q.Exec(); err != nil {
fmt.Println(err)
}
}
}()
}
start := time.Now()
for i := 0; i < 100000; i++ {
ch <- session.Query("INSERT INTO message (id,message) VALUES (uuid(),?)", "the message")
}
close(ch)
wg.Wait()
dur := time.Since(start)
fmt.Printf("execution time: %s\n", dur)
}
Please adjust connection params as needed if you feel like testing it.
Upvotes: 2