Reputation: 115
I have a ArrayList [big one read from a file] And I want to read its contents with multithreading and process each string calling a method repeatedly and printing it to a file .I have given a working structure of what my the code looks like.. How ever I a not able to code for what I want without getting tangled in exceptions related to synchronizations of threads ... I am new to the concept of threading .. and want a efficient way to to this ..I have looked at other solutions related to threading and arraylists but it hasn't worked out for me .. any suggestions as to how to go about this is appreciated
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.io.PrintStream;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
public class threadingWithMathod {
public static void main(String[] args) throws FileNotFoundException, UnsupportedEncodingException {
ArrayList<String> samples=readurls("path/to/sample.csv");
PrintStream filewriter = new PrintStream(new File("path/to/result.csv"), "UTF-8");
for (int i = 0; i < samples.size(); i++) {
String string1 = samples.get(i);
String string2 = samples.get(i+1);
///Need Info As to how process with Threading without clashing
/// sampleProcessString need to be called repeatedly
//sampleProcessString(filewriter,string) by 2-3 threads
}
}
public static void sampleProcessString(PrintStream filewriter,String string) {
filewriter.println(processedString(string));
}
private static Object processedString(String string) {
//Intended to generate a new line by using a Sql query
//This method will be using a connection to a mysql data base based on sample
return string+"++> done something";
}
public static ArrayList<String> readurls(String filename) {
ArrayList<String> aslink=new ArrayList<String>();
BufferedReader reader;
try {
reader = new BufferedReader(new FileReader( filename));
String line = reader.readLine();
while (line != null) {
aslink.add(line);
line = reader.readLine();
}
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
return aslink;
}
}
Upvotes: 0
Views: 414
Reputation: 13195
Created some snippets where you can try dropping your actual processing code into.
My test data looks like this:
try (PrintWriter pw = new PrintWriter("testdata.txt")) {
for (int i = 0; i < 1000000; i++)
pw.println(i);
}
So a textfile with a million numbers in its lines.
My "task" was to create a file containing double the value of the same lines, disregarding their order:
pw.println(Integer.parseInt(line) * 2);
where line
is a line from the input file, and pw
is a PrintWriter
for the output.
In actual code:
try (PrintWriter pw = new PrintWriter("testresult.txt");
BufferedReader br = new BufferedReader(new FileReader("testdata.txt"))) {
String line;
while ((line = br.readLine()) != null)
pw.println(Integer.parseInt(line) * 2);
}
This is something which can be written shorter, and perhaps a bit more readable with streaming:
try (PrintWriter pw = new PrintWriter("testresult.txt")) {
Files.lines(Paths.get("testdata.txt")).forEach(
line -> pw.println(Integer.parseInt(line) * 2));
}
The two snippets produce very similar execution times, around 1.6-1.7 seconds on my machine (measured with the "budget" approach, long start = System.currentTimeMillis();
before and System.out.println(System.currentTimeMillis() - start);
after).
Then the stream can be parallelized with a single .parallel()
inside:
try (PrintWriter pw = new PrintWriter("testresult.txt")) {
Files.lines(Paths.get("testdata.txt")).parallel().forEach(
line -> pw.println(Integer.parseInt(line) * 2));
}
This will produce mixed order results.
A side remark on println(int)
: it's not documented, but its actual implementation is thread-safe, however if you want to be absolutely "safe" and build on the documented features only, you should synchronize it yourself:
try (PrintWriter pw = new PrintWriter("testresult.txt")) {
Files.lines(Paths.get("testdata.txt")).parallel().forEach(line -> {
synchronized (pw) {
pw.println(Integer.parseInt(line) * 2);
}
});
}
Both of them are actually slower than the sequential (2 and 2.2 seconds, the extra manual synchronization does matter), but of course it matters a lot that this processing step is very simple. So it's important to keep in mind that if file operations eat up the time in your case too, parallelism can not really help that.
And for comparison, a complete snippet using a thread pool:
ExecutorService es = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
ExecutorCompletionService<String> ecs = new ExecutorCompletionService<String>(es);
int counter=0;
try (BufferedReader br = new BufferedReader(new FileReader("testdata.txt"))) {
String line;
while ((line = br.readLine()) != null) {
final String current = line;
ecs.submit(new Callable<String>() {
@Override
public String call() throws Exception {
return Integer.toString(Integer.parseInt(current)*2);
}
});
counter++;
}
}
try (PrintWriter pw = new PrintWriter("testresult.txt")) {
while(counter>0) {
pw.println(ecs.take().get());
counter--;
}
}
es.shutdown();
This one is the longest of them for sure, on the other hand it runs for 2 seconds, so comparable to the synchronized
-less streaming sample, and it's "safe" without it as file operations all happen in the main thread (workers only calculate and stringify). Going for full-manual threads could make things even more verbose, but I don't feel motivated to write such code at the moment.
Upvotes: 0
Reputation: 109547
Reading a large file is fastest done sequentially, because of the physical disk access.
One might use memory mapped byte buffers.
In your case (processing per line) Files.lines(Path)
(default UTF-8) may suffice.
This is fine grained concurrency. The processing may be done in parallel by using a thread pool, ExecutorService, ThreadPoolExecutor, as there will come in many threads.
You will get the results out of order. If that is a problem, pass a line number in Files.lines
' lambda.
For collecting the results, queing them in memory and asynchroneously writing them to file, one could look whether there is a high-performance logger. Probably one has to reimplement its functionality (to do away with log formatting). So a queuing thread and a thread for writing to file (a large byte buffer).
One might consider compressing the output (.csv.gz); which would be a space/time-gain on further network transport.
There are many ways of realizing this, so research javadoc and variations (FutureTask for instance) and look into examples.
ThreadPoolExecutor executor = (ThreadPoolExecutor)
Executors.newFixedThreadPool(10);
for (;;) {
Task task = new Task(...);
executor.execute(task);
}
while (!executor.isTerminated()) { ... }
executor.shutdown();
Upvotes: 1