Reputation: 8923
I've a bunch of id's in 4 files a,b,c,d saved in my workspace. I want to merge all these id's in a sorted order in a single file merged.txt .. They will be saved one per line as a string. I could individually sort files by bringing them into memory. But how do I merge them, there can be duplicate entries. I can't think of how to compare each entry in the four files (they can grow to 8 so can't hardcode this). Particularly how do I compare the entries and how do I advance only those file pointers that are the smallest of the entries ?
public void sortFile() throws IOException
{
File a = new File("/Users/phoenix/workspace/data/a.txt");
File b = new File("/Users/phoenix/workspace/data/b.txt");
File c = new File("/Users/phoenix/workspace/data/c.txt");
File d = new File("/Users/phoenix/workspace/data/d.txt");
doSort(a);
doSort(b);
doSort(c);
doSort(d);
merge();
}
How do I modify the merge method according to pseudocode below ?
public void merge()
{
File dir = new File("/Users/phoenix/workspace/data");
for(File f: dir.listFiles())
{
// toDo: merge into a single file merged.txt
}
}
public void doSort(File f) throws IOException
{
BufferedReader reader = new BufferedReader(new FileReader(f));
String line;
ArrayList<String> list = new ArrayList<String>();
while((line = reader.readLine())!=null)
{
list.add(line);
}
Collections.sort(list);
PrintWriter out = new PrintWriter(f);
for(String s:list)
out.println(s);
reader.close();
out.close();
}
public void merge() throws IOException
{
File dir = new File("/Users/phoenix/workspace/data");
File merged = new File("/Users/phoenix/workspace/data/merged.txt");
ArrayList<BufferedReader> readers = new ArrayList<BufferedReader>(dir.listFiles().length);
ArrayList<String> list = new ArrayList<String>();
PrintWriter out = new PrintWriter(merged);
for(File f: dir.listFiles())
{
readers.add(new BufferedReader(new FileReader(f)));
}
while(true)
{
for (BufferedReader reader: readers)
{
if(reader.readLine()!=null)
list.add(reader.readLine());
else
{
reader.close();
}
}
String min = Collections.min(list);
int index = list.indexOf(min);
out.write(min);
}
}
Upvotes: 2
Views: 1864
Reputation: 726589
Here is a general description of the algorithm:
Before going through with the algorithm your code needs to check that at least one input file is present; otherwise, your code should exit.
EDIT : Your merge
code does not look much like the algorithm above; here is some code to help you get started:
// Prepare your readers and their top items
for(File f: dir.listFiles()) {
BufferedReader br = new BufferedReader(new FileReader(f));
String firstLine = reader.readLine();
// Your code inserts buffered readers unconditionally;
// You should not insert readers for empty files.
if (firstLine != null) {
readers.add(br);
list.add(firstLine);
} else {
br.close();
}
}
// Stop when the last reader is removed
while (!readers.isEmpty()) {
int minIndex = ... // Find the index of the smallest item in the "list"
out.write(list.get(minIndex));
BufferedReader br = readers.get(minIndex);
String next = br.readLine();
if (next != null) {
list.set(minIndex, next);
} else {
br.close();
list.remove(minIndex);
readers.remove(minIndex);
}
}
Upvotes: 1
Reputation: 1935
Are you looking to solve your problem, or solve it in Java.
If you are simply looking for ways to do this, and have access to the terminal, and by "sorting" you mean sorting alphabetically, you can do it simpler.
cat "/Users/phoenix/workspace/data/a.txt" "/Users/phoenix/workspace/data/b.txt" "/Users/phoenix/workspace/data/c.txt" "/Users/phoenix/workspace/data/d.txt"|sort > merged.txt
For sorting and only picking up the uniq ones
cat "/Users/phoenix/workspace/data/a.txt" "/Users/phoenix/workspace/data/b.txt" "/Users/phoenix/workspace/data/c.txt" "/Users/phoenix/workspace/data/d.txt"|sort |uniq > merged.txt
Update: BTW, to sort numerically, use
sort -n
Upvotes: 2
Reputation: 136022
Read each file into a List
List<String> list1 = Files.readAllLines(Path.get(path), StandardCharsets.UTF_8);
...
merge lists1 in one list
List<String> list = new ArrayList<>();
list.addAll(list1);
...
now sort the lines
Collections.sort(list);
and write them into a single file.
Note: If you dont want duplicate lines use TreeSet instead of ArrayList
Upvotes: 0