creatiive
creatiive

Reputation: 1093

C# caching a txt file or using File.ReadLines

I am using File.ReadLines() on the same few files often and don't know the overhead associated with reading a file in this way?

I am searching for each file id (hash) within a txt file.

At the moment I am using this code but wonder if I should cache these index files. My hesitation is that the files will be edited so often that it will cause just as much performance hit by reloading the file in to cache each time. It is much more likely that I will be adding a line to the text file on each iteration (there will not be a match).

foreach (var myfile in allfiles) // roughly 5 thousand
{
...

    foreach (var line in File.ReadLines(myfile.path + "\index.txt"))
    {
        // compare the line to the current record's hash
        if (myfile.hash.equals(line))
            ...
            return x;

    }
...
// otherwise add a new line (a hash) to index.txt
}

...

There are about 5-10 index.txt files at different paths that need to be checked depending on the file... so each one would need to be cached.

Is caching the index.txt file a better idea? Does File.ReadLines() have a lot of overhead?

Thanks for any pointers.

Upvotes: 2

Views: 1845

Answers (2)

Dmitrii Bychenko
Dmitrii Bychenko

Reputation: 186698

If you have many files that are short enough, caching looks reasonable:

  // Simplest, not thread safe
  private static Dictionary<String, String[]> s_Files = 
    new Dictionary<string, string[]>(StringComparer.OrdinalIgnoreCase);

  private static IEnumerable<String> ReadLines(String path) {
    String[] lines;

    if (s_Files.TryGetValue(path, out lines))
      return lines;
    else {
      lines = File.ReadAllLines(path);

      s_Files.Add(path, lines);

      return lines;   
    }
  }

  ...

  foreach (var myfile in allfiles) {
    ...
    // Note "ReadLines" insread of "File.ReadLines"
    foreach (var line in ReadLines(myfile.path + "\index.txt")) {
    }
  }

Compare both implementations - your current one - and - this cached routine and then decide whether or not you'd want to cache.

Upvotes: 3

Riad Baghbanli
Riad Baghbanli

Reputation: 3319

I would recommend the following:

  1. store in memory the last updated timestamp for each hash file

  2. cache the content of hash files

  3. upon access to cache check if last updated timestamp of the file is greater than one stored in your memory.

  4. use ConcurrentDictionary instead of Dictionary.

Upvotes: 0

Related Questions