Pawan
Pawan

Reputation: 32331

Partition a Set into smaller Subsets and process as batch

I have a continuous running thread in my application, which consists of a HashSet to store all the symbols inside the application. As per the design at the time it was written, inside the thread's while true condition it will iterate the HashSet continuously, and update the database for all the symbols contained inside HashSet.

The maximum number of symbols that might be present inside the HashSet will be around 6000. I don't want to update the DB with all the 6000 symbols at once, but divide this HashSet into different subsets of 500 each (12 sets) and execute each subset individually and have a thread sleep after each subset for 15 minutes, so that I can reduce the pressure on the database.

This is my code (sample code snippet)

How can I partition a set into smaller subsets and process (I have seen the examples for partitioning ArrayList, TreeSet, but didn't find any example related to HashSet)

package com.ubsc.rewji.threads;

import java.util.Arrays;
import java.util.Collections;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
import java.util.concurrent.PriorityBlockingQueue;

public class TaskerThread extends Thread {
    private PriorityBlockingQueue<String> priorityBlocking = new PriorityBlockingQueue<String>();
    String symbols[] = new String[] { "One", "Two", "Three", "Four" };
    Set<String> allSymbolsSet = Collections
            .synchronizedSet(new HashSet<String>(Arrays.asList(symbols)));

    public void addsymbols(String commaDelimSymbolsList) {
        if (commaDelimSymbolsList != null) {
            String[] symAr = commaDelimSymbolsList.split(",");
            for (int i = 0; i < symAr.length; i++) {
                priorityBlocking.add(symAr[i]);
            }
        }
    }

    public void run() {
        while (true) {
            try {
                while (priorityBlocking.peek() != null) {
                    String symbol = priorityBlocking.poll();
                    allSymbolsSet.add(symbol);
                }
                Iterator<String> ite = allSymbolsSet.iterator();
                System.out.println("=======================");
                while (ite.hasNext()) {
                    String symbol = ite.next();
                    if (symbol != null && symbol.trim().length() > 0) {
                        try {
                            updateDB(symbol);

                        } catch (Exception e) {
                            e.printStackTrace();
                        }
                    }
                }
                Thread.sleep(2000);
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }

    public void updateDB(String symbol) {
        System.out.println("THE SYMBOL BEING UPDATED IS" + "  " + symbol);
    }

    public static void main(String args[]) {
        TaskerThread taskThread = new TaskerThread();
        taskThread.start();

        String commaDelimSymbolsList = "ONVO,HJI,HYU,SD,F,SDF,ASA,TRET,TRE,JHG,RWE,XCX,WQE,KLJK,XCZ";
        taskThread.addsymbols(commaDelimSymbolsList);

    }

}

Upvotes: 26

Views: 57205

Answers (6)

Andrey Chaschev
Andrey Chaschev

Reputation: 16516

With Guava:

for (List<String> partition : Iterables.partition(yourSet, 500)) {
    // ... handle partition ...
}

Or Apache Commons:

for (List<String> partition : ListUtils.partition(yourList, 500)) {
    // ... handle partition ...
}

Upvotes: 91

Rimjhim Doshi
Rimjhim Doshi

Reputation: 69

If you are not worried much about space complexity, you can do like this in a clean way :

List<List<T>> partitionList = Lists.partition(new ArrayList<>(inputSet), PARTITION_SIZE);
List<Set<T>> partitionSet = partitionList.stream().map((Function<List<T>, HashSet>) HashSet::new).collect(Collectors.toList());

Upvotes: 3

Aman
Aman

Reputation: 1744

The Guava solution from @Andrey_chaschev seems the best, but in case it is not possible to use it, I believe the following would help

public static List<Set<String>> partition(Set<String> set, int chunk) {
        if(set == null || set.isEmpty() || chunk < 1)
            return new ArrayList<>();

        List<Set<String>> partitionedList = new ArrayList<>();
        double loopsize = Math.ceil((double) set.size() / (double) chunk);

        for(int i =0; i < loopsize; i++) {
            partitionedList.add(set.stream().skip((long)i * chunk).limit(chunk).collect(Collectors.toSet()));
        }

        return partitionedList;
    }

Upvotes: 1

Amir Pashazadeh
Amir Pashazadeh

Reputation: 7322

Do something like

private static final int PARTITIONS_COUNT = 12;

List<Set<Type>> theSets = new ArrayList<Set<Type>>(PARTITIONS_COUNT);
for (int i = 0; i < PARTITIONS_COUNT; i++) {
    theSets.add(new HashSet<Type>());
}

int index = 0;
for (Type object : originalSet) {
    theSets.get(index++ % PARTITIONS_COUNT).add(object);
}

Now you have partitioned the originalSet into 12 other HashSets.

Upvotes: 11

PipoTells
PipoTells

Reputation: 571

We can use the following approach to divide a Set.

We will get the output as [a, b] [c, d] [e]`

private static List<Set<String>> partitionSet(Set<String> set, int     partitionSize)
{
    List<Set<String>> list = new ArrayList<>();
    int setSize = set.size();

    Iterator iterator = set.iterator();

    while(iterator.hasNext())
    {
        Set newSet = new HashSet();
        for(int j = 0; j < partitionSize && iterator.hasNext(); j++)
        {
            String s = (String)iterator.next();
            newSet.add(s);
        }
        list.add(newSet);
    }
    return list;
}

public static void main(String[] args)
{
    Set<String> set = new HashSet<>();
    set.add("a");
    set.add("b");
    set.add("c");
    set.add("d");
    set.add("e");

    int size = 2;
    List<Set<String>> list = partitionSet(set, 2);

    for(int i = 0; i < list.size(); i++)
    {
        Set<String> s = list.get(i);
        System.out.println(s);
    }
}

Upvotes: 3

TwoThe
TwoThe

Reputation: 14289

A very simple way for your actual problem would be to change your code as follows:

Iterator<String> ite = allSymbolsSet.iterator();
System.out.println("=======================");
int i = 500;
while ((--i > 0) && ite.hasNext()) {

A general method would be to use the iterator to take the elements out one by one in a simple loop:

int i = 500;
while ((--i > 0) && ite.hasNext()) {
  sublist.add(ite.next());
  ite.remove();
}

Upvotes: 0

Related Questions