Reputation: 1093
I am using File.ReadLines() on the same few files often and don't know the overhead associated with reading a file in this way?
I am searching for each file id (hash) within a txt file.
At the moment I am using this code but wonder if I should cache these index files. My hesitation is that the files will be edited so often that it will cause just as much performance hit by reloading the file in to cache each time. It is much more likely that I will be adding a line to the text file on each iteration (there will not be a match).
foreach (var myfile in allfiles) // roughly 5 thousand
{
...
foreach (var line in File.ReadLines(myfile.path + "\index.txt"))
{
// compare the line to the current record's hash
if (myfile.hash.equals(line))
...
return x;
}
...
// otherwise add a new line (a hash) to index.txt
}
...
There are about 5-10 index.txt files at different paths that need to be checked depending on the file... so each one would need to be cached.
Is caching the index.txt file a better idea? Does File.ReadLines() have a lot of overhead?
Thanks for any pointers.
Upvotes: 2
Views: 1845
Reputation: 186698
If you have many files that are short enough, caching looks reasonable:
// Simplest, not thread safe
private static Dictionary<String, String[]> s_Files =
new Dictionary<string, string[]>(StringComparer.OrdinalIgnoreCase);
private static IEnumerable<String> ReadLines(String path) {
String[] lines;
if (s_Files.TryGetValue(path, out lines))
return lines;
else {
lines = File.ReadAllLines(path);
s_Files.Add(path, lines);
return lines;
}
}
...
foreach (var myfile in allfiles) {
...
// Note "ReadLines" insread of "File.ReadLines"
foreach (var line in ReadLines(myfile.path + "\index.txt")) {
}
}
Compare both implementations - your current one - and - this cached routine and then decide whether or not you'd want to cache.
Upvotes: 3
Reputation: 3319
I would recommend the following:
store in memory the last updated timestamp for each hash file
cache the content of hash files
upon access to cache check if last updated timestamp of the file is greater than one stored in your memory.
use ConcurrentDictionary instead of Dictionary.
Upvotes: 0