Reputation: 2989
Is there a way to perform binary search on a file stored in UTF format in sorted order. I am able to perform binary search on a text file using RandomAccessFile. First I find out the length of the file and then jump to the middle position of the file using fseek, after jumping to the middle position I read the bytes. However, I am not finding it feasible for a file stored in UTF format, as the first characters are random in UTF format. And also with DataInputStream I am unable to jump to a particular position in the file. Is it possible to do binary search on such a file. If yes, then using which classes.
Upvotes: 0
Views: 265
Reputation: 78825
Yes, it is possible. If you jump into the middle of a file, you will first need to go to the nearest record separator and then use the text starting after the record separator.
Depending on the exact file format you have, a line feed, a TAB character or something similar could be used as the record separator.
Locating the record separator is easy if it is a character with a Unicode number below 32 (which NL, CR, TAB fulfill). Then you don't need to care about the multibyte UTF-8 encoding (for locating the separator). If it's a wide character Unicode format, then it isn't much more difficult either.
DataInputStream is the wrong class from random access. (Streaming is sort of the opposite of random access.) Have a look at RandomAccessFile instead.
Upvotes: 1