Sudharshan
Sudharshan

Reputation: 31

Java code for tail n lines of file equivalent to tail commad in unix

Following is the code written for tailing 'n' no of lines of a file.

 <code>

import java.io.RandomAccessFile;
import java.util.HashMap;
import java.util.Map;

class TailCommand {
public static void main(String args[]) {
    int j;
    try {
        /*
         * Receive file name and no of lines to tail as command line
         * argument
         */
        RandomAccessFile randomFile = new RandomAccessFile(args[0], "r");
        long numberOfLines = Long.valueOf(args[1]).longValue();
        long lineno = 0;
        String str;
        String outstr;
        StringBuilder sb = new StringBuilder();
        Map<Long, String> strmap = new HashMap<Long, String>();
        while ((str = randomFile.readLine()) != null) {
            strmap.put(lineno + 1, str);
            lineno++;
        }
        System.out.println("Total no of lines in file is " + lineno);
        long startPosition = lineno - numberOfLines;
        while (startPosition <= lineno) {
            if (strmap.containsKey(startPosition)) {
            // System.out.println("HashMap contains "+  startPosition
                // +" as key");
                outstr = (String) strmap.get(startPosition);
                sb.append(outstr);
                System.out.println(outstr);
            }
            startPosition++;
        }
        // Collection coll = strmap.values();
        // System.out.println(coll+"size"+strmap.size());
        // System.out.println(sb);
    } catch (Exception e) {
        e.printStackTrace();
    }
}
}

I used the following approach: The File and no of lines to be tailed is received as a command line argument

  1. use readLine method to get the total no of lines in a file
  2. use a incrementer for each readLine call
  3. Store this incrementer and string returned by readLinemethod in a Hash Map
  4. As a result whole file is stored in Hash Map
  5. Now you can use the hash map key to retrieve values of files from a specific line no
  6. You can use stringbuilder to print the selection from particular line

My doubts,

Is my approach valid, and can i use this approach for large files of size greater than 10MB ? what improvements i need to make if more people has to tail simultaneously from same file? May I use StringBuilder for larger files also?

Upvotes: 2

Views: 7844

Answers (5)

Esko
Esko

Reputation: 29377

As mentioned in my comment to djna's answer, you're not doing this very efficiently:

  • You're reading in the whole file. If the file is large and n of lines is small, you're just wasting time, I/O and what have you.
  • Additionally you're wasting memory.
  • There's no buffering (besides what RandomAccessFile#readLine() may or may not provide) which is also causing some possible slowdowns.

So, what I'd do would be to read in the file from the end backwards in chunks and process the chunks separately.

RandomAccessFile raf = new RandomAccessFile(new File(file), "r");
List<String> lines = new ArrayList<String>();

final int chunkSize = 1024 * 32;
long end = raf.length();
boolean readMore = true;
while (readMore) {
    byte[] buf = new byte[chunkSize];

    // Read a chunk from the end of the file
    long startPoint = end - chunkSize;
    long readLen = chunkSize;
    if (startPoint < 0) {
        readLen = chunkSize + startPoint;
        startPoint = 0;
    }
    raf.seek(startPoint);
    readLen = raf.read(buf, 0, (int)readLen);
    if (readLen <= 0) {
        break;
    }

    // Parse newlines and add them to an array
    int unparsedSize = (int)readLen;
    int index = unparsedSize - 1;
    while (index >= 0) {
        if (buf[index] == '\n') {
            int startOfLine = index + 1;
            int len = (unparsedSize - startOfLine);
            if (len > 0) {
                lines.add(new String(buf, startOfLine, len));
            }
            unparsedSize = index + 1;
        }
        --index;
    }

    // Move end point back by the number of lines we parsed
    // Note: We have not parsed the first line in the chunked
    // content because could be a partial line
    end = end - (chunkSize - unparsedSize);

    readMore = lines.size() < linesToRead && startPoint != 0;
}

// Only print the requested number of lines
if (linesToRead > lines.size()) {
    linesToRead = lines.size();
}

for (int i = linesToRead - 1; i >= 0; --i) {
    pw.print(lines.get(i));
}

Upvotes: 6

Sudharshan
Sudharshan

Reputation: 31

I have modified the code based on above suggestions:Please see the updated code as mentioned below:

The logic used is described below:

1.Seek to the EOF file using the length of file
2.Move the file pointer backwards from EOF and check for occurrence of '\n'.
3.If '\n' occurrence is found,increment your line counter and put the output of readline to hashMap
4.Retrieve the values from hashMap in descending order. .I hope the above approach would not be causing memory problems and it is clear. Please suggest.

                                                                                    import java.io.RandomAccessFile;
   import java.util.HashMap;
   import java.util.Map;

   class NewTailCommand {
    public static void main(String args[]) {
    Map<Long, String> strmap = new HashMap<Long, String>();
    long numberOfLines = Long.valueOf(args[1]).longValue();
    try {
        /*
         * Receive file name and no of lines to tail as command line
         * argument
         */
        RandomAccessFile randomFile = new RandomAccessFile(args[0], "r");

        long filelength = randomFile.length();
        long filepos = filelength - 1;
        long linescovered = 1;
        System.out.println(filepos);
        for (linescovered = 1; linescovered <= numberOfLines; filepos--) {
            randomFile.seek(filepos);
            if (randomFile.readByte() == 0xA)
                if (filepos == filelength - 1)
                    continue;
                else {
                         strmap.put(linescovered,randomFile.readLine());
                    linescovered++;
                }

        }
    } catch (Exception e) {
        e.printStackTrace();
    }
    long startPosition = numberOfLines;
    while (startPosition != 0) {
        if (strmap.containsKey(startPosition)) {
            // System.out.println("HashMap contains "+ startPosition
            // +" as key");
            String outstr = (String) strmap.get(startPosition);
            System.out.println(outstr);
            startPosition--;

        }
    }
}
}

Upvotes: 0

aroth
aroth

Reputation: 54816

Is my approach valid, and can i use this approach for large files of size greater than 10MB?

Yes, it is valid. Yes you "can" use it for larger files, but since you are always scanning the entire file the performance will degrade the longer the file gets. And similarly, since you store the whole thing in memory your memory requirements will increase all the way to the point where a very large file will start causing OutOfMemoryError issues.

what improvements i need to make if more people has to tail simultaneously from same file?

None, since you are only tailing the last n lines. Each person can simply run their own instance of the program. If you wanted to follow the file as updates are made over time (like what tail does if you omit the -n parameter), then you'd have to make some changes.

May I use StringBuilder for larger files also?

Of course you may, but it's not clear to me what you would gain.

Personally I would recommend restructuring your algorithm as follows:

  1. Seek to the end of the file.
  2. Parse backwards until you have encountered the required number of \n characters.
  3. Read forwards to the end of the file, printing as you go.

Then there's no need to buffer each line in the file and no performance degradation on very large file sizes.

Upvotes: 3

Petar Ivanov
Petar Ivanov

Reputation: 93040

You basically read the whole file in memory - to do that you don't need random access file, really.

If the file is huge that might not be the best option.

Why not use the HashMap to store (line number, position in the file), instead of (line number -> line). This way you would know which position to seek for the last n lines.

Another way would be to use a buffer (array) of n strings - the last n lines so far. But be careful, when reading a new line you don't want to move all the elements in the buffer (i.e. 1->0, 2->1, ..., n->(n-1), and then add the new line at the end). Use cyclic buffer instead. (Keep an index into the buffer to the end position and override the next position when adding a new line. If you are at position n-1, the next is 0 - so cyclic).

Upvotes: 0

djna
djna

Reputation: 55907

Seems like you're keeping the whole file in memory, you only need to keep "n" lines. So instead allocate an array of size n, use it as a ring buffer.

In the code you show you don't seem to use the StringBuilder, I guess you're using to build the output. As that should depend only on n, not the size of the file I don't see why it should be a problem to use StringBuilder.

Upvotes: 0

Related Questions