insert large amount of data to cassandra efficiently

Question

I want to insert around 50 million rows ( ~ 30 columns each) into cassandra, currently only have 1 node.

I am querying my data from another data source and store in a table object. I iterate through parse each of the row individually then add it to the mutator. Currently I am inserting 100 rows at a time and 1 million rows takes 40 minutes! How do I speed up this process? ( I have also tried client.batch_mutate() but it seems to have reset connection error after a few thousand inserts of blocksize 2).

Through searching around I see that multi-threading could help. But I could not find any examples, could someone link me? thank you !!

My current code:

        List colNames = new ArrayList();
        List colValues = new ArrayList();
        SomeTable result = Query(...); // this contains my result set of 1M rows initially

        for (Iterator itr = result.getRecordIterator(); itr.hasNext();) {
                String colName =.....
                String colValue = .....

            int colCount = colNames.size(); // 100 * 30

            for (int i = 0; i < colCount; i++) {
                //add row keys and columns to mutator 
                mutator.addInsertion(String.valueOf(rowCounter), "data", HFactory.createStringColumn(colNames.get(i), colValues.get(i)));
            }
            rowCounter++;

            //insert rows of block size 100
            if (rowCounter % 100==0) { 

                mutator.execute();
                //clear data
                colNames = new ArrayList();
                colValues = new ArrayList();
                mutator = HFactory.createMutator(keyspace, stringSerializer);
            }

        }

insert large amount of data to cassandra efficiently

Answers (1)

Related Questions