Reputation: 158
im implementing external merge sort using Java.
So given a file I split it into smaller ones , then sort the smaller portions and finally merge the sorted (smaller) files.
So , the last step is what im having trouble with.
I have a list of files and I want at each step , take the minimum value of the first rows of each file and then remove that line.
So , it is supposed to be something like this:
public static void mergeSortedFiles(List<File> sorted, File output) throws IOException {
BufferedWriter wf = new BufferedWriter(new FileWriter(output));
String curLine = "";
while(!sorted.isEmpty()) {
curLine = findMinLine(sorted);
wf.write(curLine);
}
}
public static String findMinLine(List<File> sorted) throws IOException {
List<BufferedReader> brs = new ArrayList<>();
for(int i =0; i<sorted.size() ; i++) {
brs.add(new BufferedReader(new FileReader(sorted.get(i))));
}
List<String> lines = new ArrayList<>();
for(BufferedReader br : brs) {
lines.add(br.readLine());
}
Collections.sort(lines);
return lines.get(0);
}
Im not sure how to update the files, anyone can help with that?
Thanks for helping!
Upvotes: 0
Views: 3705
Reputation: 25950
You can create a Comparable
wrapper around each file and then place the wrappers in a heap (for example a PriorityQueue
).
public class ComparableFile<T extends Comparable<T>> implements Comparable<ComparableFile<T>> {
private final Deserializer<T> deserializer;
private final Iterator<String> lines;
private T buffered;
public ComparableFile(File file, Deserializer<T> deserializer) {
this.deserializer = deserializer;
try {
this.lines = Files.newBufferedReader(file.toPath()).lines().iterator();
} catch (IOException e) {
// deal with it differently if you want, I'm just providing a working example
// and wanted to use the constructor in a lambda function
throw new UncheckedIOException(e);
}
}
@Override
public int compareTo(ComparableFile<T> that) {
T mine = peek();
T theirs = that.peek();
if (mine == null) return theirs == null ? 0 : -1;
if (theirs == null) return 1;
return mine.compareTo(theirs);
}
public T pop() {
T tmp = peek();
if (tmp != null) {
buffered = null;
return tmp;
}
throw new NoSuchElementException();
}
public boolean isEmpty() {
return peek() == null;
}
private T peek() {
if (buffered != null) return buffered;
if (!lines.hasNext()) return null;
return buffered = deserializer.deserialize(lines.next());
}
}
Then, you can merge them this way:
public class MergeFiles<T extends Comparable<T>> {
private final PriorityQueue<ComparableFile<T>> files;
public MergeFiles(List<File> files, Deserializer<T> deserializer) {
this.files = new PriorityQueue<>(files.stream()
.map(file -> new ComparableFile<>(file, deserializer))
.filter(comparableFile -> !comparableFile.isEmpty())
.collect(toList()));
}
public Iterator<T> getSortedElements() {
return new Iterator<T>() {
@Override
public boolean hasNext() {
return !files.isEmpty();
}
@Override
public T next() {
if (!hasNext()) throw new NoSuchElementException();
ComparableFile<T> head = files.poll();
T next = head.pop();
if (!head.isEmpty()) files.add(head);
return next;
}
};
}
}
And here's some code to demonstrate it works:
public static void main(String[] args) throws IOException {
List<File> files = Arrays.asList(
newTempFile(Arrays.asList("hello", "world")),
newTempFile(Arrays.asList("english", "java", "programming")),
newTempFile(Arrays.asList("american", "scala", "stackoverflow"))
);
Iterator<String> sortedElements = new MergeFiles<>(files, line -> line).getSortedElements();
while (sortedElements.hasNext()) {
System.out.println(sortedElements.next());
}
}
private static File newTempFile(List<String> words) throws IOException {
File tempFile = File.createTempFile("sorted-", ".txt");
Files.write(tempFile.toPath(), words);
tempFile.deleteOnExit();
return tempFile;
}
Output:
american
english
hello
java
programming
scala
stackoverflow
world
Upvotes: 2
Reputation: 18408
The standard merge technique between a fixed number of files (say, 2) is :
if (key_1.compareTo(key_2) == 0) { process both files ; then read both files} else if (key_1.compareTo(key_2) == -1) { process file 1 ; then read file 1} else { process file 2 ; then read file 2}
Note how this code does essentially nothing more than determine the file with the lowest key, and process that.
If your number of files is variable, then your number of key variables is variable too, and "determining the file with the lowest current key" cannot be done as per above. Instead, have as many current_key_value objects as there are files, and store them all in a TreeSet. Now, the first element of the TreeSet will be the lowest current key value of all the files and if you make sure that you maintain a link between your key variable and the file number you just process that file (and delete the just processed key value from the TreeSet and read a new record from the processed file and add its key value to the TreeSet).
Upvotes: 0
Reputation: 1
So what you want to do is to swap two lines in a text file? You can do it by using a RandomAccessFile
however this will be horrible slow since everytime when you swap two lines you have to wait for the next IO burst.
So i highly recommend you to use the following code to be able to do the merge sort on the heap:
List<String> lines1 = Files.readAllLines(youFile1);
List<String> lines2 = Files.readAllLines(youFile2);
//use merge sort on theese lines
List<String> merged;
FileWriter writer = new FileWriter(yourOutputFile);
for(String str: merged) {
writer.write(str + System.lineSeparator());
}
writer.close();
Upvotes: 0