Python multiprocessing bach insert in Cassandra, no performance improved

Question

I tried batch insert in a single process and multiprocessing, but they used the same time. I didn't get any performance improved. keyspace of cassandra is SimpleStrategy, I think it has only one node. Do these influence?

This is my code for multiprocessing, could you help me find where is wrong?

lock = Lock()
ID = Value('i', 0)

def copy(x): 

    cluster = Cluster()
    session = cluster.connect('test')
    global lock, row_ID
    count = 0

    insertt = session.prepare("INSERT INTO table2(id, age, gender, name) values(?, ?, ?, ?)")
    batch = BatchStatement()

    for i in x:
        with open(files[i]) as csvfile:
            reader = csv.reader(csvfile, delimiter=',')
            for row in tqdm(reader):
                if count <= 59:
                    with lock:
                        ID.value += 1
                    name_ID = row[1]
                    gender_ID = row[2]
                    age_ID = int(row[3])
                    batch.add(insertt, (ID.value, age_ID, gender_ID, name_ID))
                    count += 1
                else: 
                    count = 0
                    with lock:
                        ID.value += 1
                    name_ID = row[1]
                    gender_ID = row[2]
                    age_ID = int(row[3])
                    batch.add(insertt, (ID.value, age_ID, gender_ID, name_ID))
                    session.execute(batch)
                    batch = BatchStatement()

if __name__ == '__main__':
    start = time.time()
    with Pool() as p:
        p.map(copy, [range(0,6),range(6,12),range(12,18),range(18,24)])
        end = time.time()
        t = end - start
        print(t)

Python multiprocessing bach insert in Cassandra, no performance improved

Answers (1)

Related Questions