Reputation: 13188
I have a text file I use to hold an index of files and words (with their frequencies) that appear in them. I need to read the file into memory and store the words so they can be searched. The file is formatted as follows:
<files> 169
0:file0.txt
1:file1.txt
2:file2.txt
3:file3.txt
... etc ...
</files>
<list> word 2
9: 10
1: 2
</list>
<list> word2 4
3: 19
5: 12
0: 2
8: 2
</list>
... etc ...
The problem is that this index file can become extremely large and won't all fit into memory at once. My solution is to only store a handful of them in a HashTable at once and then when I need to get the data for another word, I would kick an old word out and then parse the data for the new word from a file.
How can I efficiently accomplish this in C? I was thinking I would have to do something with fseek and rewinding once I got to certain points.
Thanks,
Mike
Upvotes: 1
Views: 379
Reputation: 13188
It ended up that the best way to do this (for my needs) was to keep a pointer to current location in the file and the use rewind( FILE *f );
when I reached the end.
Upvotes: 0
Reputation: 2105
Like mattnz pointed out, this is best achieved using separate database layer. You can try SQlite. There is almost zero setup and is very stable. Otherwise, if you want to do this in C, you can have a header in beginning of file with links/indexes to each section of the file. Section being <files>..</files>, <list>..</list>. This is just on top of my head. If you read any book on implementing databases, you can find many more techniques.
Upvotes: 2
Reputation: 516
Although C has poor string support - from what I can tell looking at the sample, it has a distinct pattern, re-parsing this from disk would be practical.
I would however consider converting the file into a database and work from there. Unless there is reason not to, pull in a third party database engine.
If you decide to go for re parsing the text file, It does not look too difficult. First pass store the start locations of each list, as a pair. Then all you do is seek to the index to read the data for a particular word.
If your efficiency concern is how long it will take the computer to do the parsing, forget it, work out what is easiest for you. Don't optimize till you know you need to. Computers are fast and cheap, programmers aren't.
Upvotes: 1