Reputation: 1909
My task is to provide random read access to a very large (50GB+) ASCII text file (processing requests for nth line/nth word in nth line) in a form of C# console app.
After googling and reading for a few days, I've come to such vision of implementation:
Since StreamReader is good at sequential access, use it to build an index of lines/words in file (List<List<long>>
map, where map[i][j]
is position where jth word of ith line starts). And then use the index to access file through MemoryMappedFile, since it good at providing random access.
Are there some obvious flaws in the solution? Would it be optimal for a given task?
UPD: It will be executed at 64bit system.
Upvotes: 1
Views: 944
Reputation: 6335
I believe your solution is a good start - even thou List container is not the best Map container - Lists are very slow to read arbitrary elements.
I would test whether doing List<List<long>>
map is the best in terms of memory/speed tradeoff - since OS caches memory maps at page boundaries (4096 bytes on x86/x64), it might be actually faster to only look up the address of the start of each line, and then scan the line looking for words.
Obviously, this approach would only work on 64bit OS, but the performance benefit of an MMap is significant - this is one of the few places where going 64bit matters a lot - database applications :)
Upvotes: 2
Reputation: 155523
It seems fine, but if you're using MemoryMapping then your program will only work on a 64-bit system because you're excessing the effective 2GB address space available.
You'll be fine with just using a FileStream
and calling .Seek()
to jump to the selected offset as appropriate, so I don't see a need for using MemoryMapped files.
Upvotes: 5