Reputation: 32331
I have a continuous running thread in my application, which consists of a HashSet to store all the symbols inside the application. As per the design at the time it was written, inside the thread's while true condition it will iterate the HashSet continuously, and update the database for all the symbols contained inside HashSet.
The maximum number of symbols that might be present inside the HashSet will be around 6000. I don't want to update the DB with all the 6000 symbols at once, but divide this HashSet into different subsets of 500 each (12 sets) and execute each subset individually and have a thread sleep after each subset for 15 minutes, so that I can reduce the pressure on the database.
This is my code (sample code snippet)
How can I partition a set into smaller subsets and process (I have seen the examples for partitioning ArrayList, TreeSet, but didn't find any example related to HashSet)
package com.ubsc.rewji.threads;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
import java.util.concurrent.PriorityBlockingQueue;
public class TaskerThread extends Thread {
private PriorityBlockingQueue<String> priorityBlocking = new PriorityBlockingQueue<String>();
String symbols[] = new String[] { "One", "Two", "Three", "Four" };
Set<String> allSymbolsSet = Collections
.synchronizedSet(new HashSet<String>(Arrays.asList(symbols)));
public void addsymbols(String commaDelimSymbolsList) {
if (commaDelimSymbolsList != null) {
String[] symAr = commaDelimSymbolsList.split(",");
for (int i = 0; i < symAr.length; i++) {
priorityBlocking.add(symAr[i]);
}
}
}
public void run() {
while (true) {
try {
while (priorityBlocking.peek() != null) {
String symbol = priorityBlocking.poll();
allSymbolsSet.add(symbol);
}
Iterator<String> ite = allSymbolsSet.iterator();
System.out.println("=======================");
while (ite.hasNext()) {
String symbol = ite.next();
if (symbol != null && symbol.trim().length() > 0) {
try {
updateDB(symbol);
} catch (Exception e) {
e.printStackTrace();
}
}
}
Thread.sleep(2000);
} catch (Exception e) {
e.printStackTrace();
}
}
}
public void updateDB(String symbol) {
System.out.println("THE SYMBOL BEING UPDATED IS" + " " + symbol);
}
public static void main(String args[]) {
TaskerThread taskThread = new TaskerThread();
taskThread.start();
String commaDelimSymbolsList = "ONVO,HJI,HYU,SD,F,SDF,ASA,TRET,TRE,JHG,RWE,XCX,WQE,KLJK,XCZ";
taskThread.addsymbols(commaDelimSymbolsList);
}
}
Upvotes: 26
Views: 57205
Reputation: 16516
With Guava:
for (List<String> partition : Iterables.partition(yourSet, 500)) {
// ... handle partition ...
}
Or Apache Commons:
for (List<String> partition : ListUtils.partition(yourList, 500)) {
// ... handle partition ...
}
Upvotes: 91
Reputation: 69
If you are not worried much about space complexity, you can do like this in a clean way :
List<List<T>> partitionList = Lists.partition(new ArrayList<>(inputSet), PARTITION_SIZE);
List<Set<T>> partitionSet = partitionList.stream().map((Function<List<T>, HashSet>) HashSet::new).collect(Collectors.toList());
Upvotes: 3
Reputation: 1744
The Guava solution from @Andrey_chaschev seems the best, but in case it is not possible to use it, I believe the following would help
public static List<Set<String>> partition(Set<String> set, int chunk) {
if(set == null || set.isEmpty() || chunk < 1)
return new ArrayList<>();
List<Set<String>> partitionedList = new ArrayList<>();
double loopsize = Math.ceil((double) set.size() / (double) chunk);
for(int i =0; i < loopsize; i++) {
partitionedList.add(set.stream().skip((long)i * chunk).limit(chunk).collect(Collectors.toSet()));
}
return partitionedList;
}
Upvotes: 1
Reputation: 7322
Do something like
private static final int PARTITIONS_COUNT = 12;
List<Set<Type>> theSets = new ArrayList<Set<Type>>(PARTITIONS_COUNT);
for (int i = 0; i < PARTITIONS_COUNT; i++) {
theSets.add(new HashSet<Type>());
}
int index = 0;
for (Type object : originalSet) {
theSets.get(index++ % PARTITIONS_COUNT).add(object);
}
Now you have partitioned the originalSet
into 12 other HashSets.
Upvotes: 11
Reputation: 571
We can use the following approach to divide a Set.
We will get the output as [a, b] [c, d] [e]`
private static List<Set<String>> partitionSet(Set<String> set, int partitionSize)
{
List<Set<String>> list = new ArrayList<>();
int setSize = set.size();
Iterator iterator = set.iterator();
while(iterator.hasNext())
{
Set newSet = new HashSet();
for(int j = 0; j < partitionSize && iterator.hasNext(); j++)
{
String s = (String)iterator.next();
newSet.add(s);
}
list.add(newSet);
}
return list;
}
public static void main(String[] args)
{
Set<String> set = new HashSet<>();
set.add("a");
set.add("b");
set.add("c");
set.add("d");
set.add("e");
int size = 2;
List<Set<String>> list = partitionSet(set, 2);
for(int i = 0; i < list.size(); i++)
{
Set<String> s = list.get(i);
System.out.println(s);
}
}
Upvotes: 3
Reputation: 14289
A very simple way for your actual problem would be to change your code as follows:
Iterator<String> ite = allSymbolsSet.iterator();
System.out.println("=======================");
int i = 500;
while ((--i > 0) && ite.hasNext()) {
A general method would be to use the iterator to take the elements out one by one in a simple loop:
int i = 500;
while ((--i > 0) && ite.hasNext()) {
sublist.add(ite.next());
ite.remove();
}
Upvotes: 0