Reputation: 31
Following is the code written for tailing 'n' no of lines of a file.
<code>
import java.io.RandomAccessFile;
import java.util.HashMap;
import java.util.Map;
class TailCommand {
public static void main(String args[]) {
int j;
try {
/*
* Receive file name and no of lines to tail as command line
* argument
*/
RandomAccessFile randomFile = new RandomAccessFile(args[0], "r");
long numberOfLines = Long.valueOf(args[1]).longValue();
long lineno = 0;
String str;
String outstr;
StringBuilder sb = new StringBuilder();
Map<Long, String> strmap = new HashMap<Long, String>();
while ((str = randomFile.readLine()) != null) {
strmap.put(lineno + 1, str);
lineno++;
}
System.out.println("Total no of lines in file is " + lineno);
long startPosition = lineno - numberOfLines;
while (startPosition <= lineno) {
if (strmap.containsKey(startPosition)) {
// System.out.println("HashMap contains "+ startPosition
// +" as key");
outstr = (String) strmap.get(startPosition);
sb.append(outstr);
System.out.println(outstr);
}
startPosition++;
}
// Collection coll = strmap.values();
// System.out.println(coll+"size"+strmap.size());
// System.out.println(sb);
} catch (Exception e) {
e.printStackTrace();
}
}
}
I used the following approach: The File and no of lines to be tailed is received as a command line argument
My doubts,
Is my approach valid, and can i use this approach for large files of size greater than 10MB ? what improvements i need to make if more people has to tail simultaneously from same file? May I use StringBuilder for larger files also?
Upvotes: 2
Views: 7844
Reputation: 29377
As mentioned in my comment to djna's answer, you're not doing this very efficiently:
RandomAccessFile#readLine() may or may not provide
) which is also causing some possible slowdowns.So, what I'd do would be to read in the file from the end backwards in chunks and process the chunks separately.
RandomAccessFile raf = new RandomAccessFile(new File(file), "r");
List<String> lines = new ArrayList<String>();
final int chunkSize = 1024 * 32;
long end = raf.length();
boolean readMore = true;
while (readMore) {
byte[] buf = new byte[chunkSize];
// Read a chunk from the end of the file
long startPoint = end - chunkSize;
long readLen = chunkSize;
if (startPoint < 0) {
readLen = chunkSize + startPoint;
startPoint = 0;
}
raf.seek(startPoint);
readLen = raf.read(buf, 0, (int)readLen);
if (readLen <= 0) {
break;
}
// Parse newlines and add them to an array
int unparsedSize = (int)readLen;
int index = unparsedSize - 1;
while (index >= 0) {
if (buf[index] == '\n') {
int startOfLine = index + 1;
int len = (unparsedSize - startOfLine);
if (len > 0) {
lines.add(new String(buf, startOfLine, len));
}
unparsedSize = index + 1;
}
--index;
}
// Move end point back by the number of lines we parsed
// Note: We have not parsed the first line in the chunked
// content because could be a partial line
end = end - (chunkSize - unparsedSize);
readMore = lines.size() < linesToRead && startPoint != 0;
}
// Only print the requested number of lines
if (linesToRead > lines.size()) {
linesToRead = lines.size();
}
for (int i = linesToRead - 1; i >= 0; --i) {
pw.print(lines.get(i));
}
Upvotes: 6
Reputation: 31
I have modified the code based on above suggestions:Please see the updated code as mentioned below:
The logic used is described below:
1.Seek to the EOF file using the length of file
2.Move the file pointer backwards from EOF and check for occurrence of
'\n'.
3.If '\n' occurrence is found,increment your line counter and
put the output of readline to hashMap
4.Retrieve the values from hashMap in descending order. .I hope the
above approach would not be causing memory problems and it is clear.
Please suggest.
import java.io.RandomAccessFile;
import java.util.HashMap;
import java.util.Map;
class NewTailCommand {
public static void main(String args[]) {
Map<Long, String> strmap = new HashMap<Long, String>();
long numberOfLines = Long.valueOf(args[1]).longValue();
try {
/*
* Receive file name and no of lines to tail as command line
* argument
*/
RandomAccessFile randomFile = new RandomAccessFile(args[0], "r");
long filelength = randomFile.length();
long filepos = filelength - 1;
long linescovered = 1;
System.out.println(filepos);
for (linescovered = 1; linescovered <= numberOfLines; filepos--) {
randomFile.seek(filepos);
if (randomFile.readByte() == 0xA)
if (filepos == filelength - 1)
continue;
else {
strmap.put(linescovered,randomFile.readLine());
linescovered++;
}
}
} catch (Exception e) {
e.printStackTrace();
}
long startPosition = numberOfLines;
while (startPosition != 0) {
if (strmap.containsKey(startPosition)) {
// System.out.println("HashMap contains "+ startPosition
// +" as key");
String outstr = (String) strmap.get(startPosition);
System.out.println(outstr);
startPosition--;
}
}
}
}
Upvotes: 0
Reputation: 54816
Is my approach valid, and can i use this approach for large files of size greater than 10MB?
Yes, it is valid. Yes you "can" use it for larger files, but since you are always scanning the entire file the performance will degrade the longer the file gets. And similarly, since you store the whole thing in memory your memory requirements will increase all the way to the point where a very large file will start causing OutOfMemoryError
issues.
what improvements i need to make if more people has to tail simultaneously from same file?
None, since you are only tailing the last n
lines. Each person can simply run their own instance of the program. If you wanted to follow the file as updates are made over time (like what tail
does if you omit the -n
parameter), then you'd have to make some changes.
May I use StringBuilder for larger files also?
Of course you may, but it's not clear to me what you would gain.
Personally I would recommend restructuring your algorithm as follows:
\n
characters.Then there's no need to buffer each line in the file and no performance degradation on very large file sizes.
Upvotes: 3
Reputation: 93040
You basically read the whole file in memory - to do that you don't need random access file, really.
If the file is huge that might not be the best option.
Why not use the HashMap to store (line number, position in the file), instead of (line number -> line). This way you would know which position to seek for the last n lines.
Another way would be to use a buffer (array) of n strings - the last n lines so far. But be careful, when reading a new line you don't want to move all the elements in the buffer (i.e. 1->0, 2->1, ..., n->(n-1), and then add the new line at the end). Use cyclic buffer instead. (Keep an index into the buffer to the end position and override the next position when adding a new line. If you are at position n-1, the next is 0 - so cyclic).
Upvotes: 0
Reputation: 55907
Seems like you're keeping the whole file in memory, you only need to keep "n" lines. So instead allocate an array of size n, use it as a ring buffer.
In the code you show you don't seem to use the StringBuilder, I guess you're using to build the output. As that should depend only on n, not the size of the file I don't see why it should be a problem to use StringBuilder.
Upvotes: 0