CR41G14
CR41G14

Reputation: 5594

Find in Files C#

I have a Folder which has multiple sub folders. Each sub folder has many .dot and .txt files in them.

Is there a simple solution in C# .NET that will iterate through each file and check the contents of that file for a key phrase or keyword?

Document Name        Keyword1         Keyword2         Keyword3        ...
  test.dot              Y               N                Y

To summarise:

  1. Select a folder
  2. Enter a list of keywords to search for
  3. The program will then search through each file and at the end output something like above, I am not to worried about creating the datatable to show the datagrid as I can do this. I just need to perform the find in files function similar to Notepad++'s find in files option

Thanks in advance

Upvotes: 3

Views: 2966

Answers (4)

Joey Gennari
Joey Gennari

Reputation: 2361

Here's a way using Tim's original answer to get the line number:

var keyWords = new[] { "Keyword1", "Keyword2", "Keyword3" };
var allDotFiles = Directory.EnumerateFiles(folder, "*.dot", SearchOption.AllDirectories);
var allTxtFiles = Directory.EnumerateFiles(folder, "*.txt", SearchOption.AllDirectories);
var allFiles = allDotFiles.Concat(allTxtFiles);
var allMatches = from fn in allFiles
                 from line in File.ReadLines(fn).Select((item, index) => new { LineNumber = index, Line = item})
                 from kw in keyWords
                 where line.Line.Contains(kw)
                 select new
                 {
                     File = fn,
                     Line = line.Line,
                     LineNumber = line.LineNumber,
                     Keyword = kw
                 };

foreach (var matchInfo in allMatches)
    Console.WriteLine("File => {0} Line => {1} Keyword => {2} Line Number => {3}"
        , matchInfo.File, matchInfo.Line, matchInfo.Keyword, matchInfo.LineNumber);

Upvotes: 0

Tim Schmelter
Tim Schmelter

Reputation: 460108

You can use Directory.EnumerateFiles with a searchpattern and the recursive hint(SearchOption.AllDirectories). The rest is easy with LINQ:

var keyWords = new []{"Y","N","Y"};
var allDotFiles = Directory.EnumerateFiles(folder, "*.dot", SearchOption.AllDirectories);
var allTxtFiles = Directory.EnumerateFiles(folder, "*.txt", SearchOption.AllDirectories);
var allFiles = allDotFiles.Concat(allTxtFiles);
var allMatches = from fn in allFiles
                 from line in File.ReadLines(fn)
                 from kw in keyWords
                 where line.Contains(kw)
                 select new { 
                     File = fn,
                     Line = line,
                     Keyword = kw
                 };

foreach (var matchInfo in allMatches)
    Console.WriteLine("File => {0} Line => {1} Keyword => {2}"
        , matchInfo.File, matchInfo.Line, matchInfo.Keyword);

Note that you need to add using System.Linq;

Is there a way just to get the line number?

If you just want the line numbers you can use this query:

var matches = allFiles.Select(fn => new
{
    File = fn,
    LineIndices = String.Join(",",
                File.ReadLines(fn)
                .Select((l,i) => new {Line=l, Index =i})
                .Where(x => keyWords.Any(w => x.Line.Contains(w)))
                .Select(x => x.Index)),
})
.Where(x => x.LineIndices.Any());

foreach (var match in matches)
    Console.WriteLine("File => {0} Linenumber => {1}"
        , match.File, match.LineIndices);

It's a little bit more difficult since LINQ's query syntax doesn't allow to pass the index.

Upvotes: 3

Viktor Latypov
Viktor Latypov

Reputation: 14467

The first step: locate all files. It is easily done with System.IO.Directory.GetFiles() + System.IO.File.ReadAllText(), as others have mentioned.

The second step: find keywords in a file. This is simple if you have one keyword and it can be done with IndexOf() method, but iterating a file multiple times (especially if it is big) is a waste.

To quickly find multiple keywords in a text I think you should use the Aho-Corasick automaton (algorithm). See the C# implementation at CodeProject: http://www.codeproject.com/Articles/12383/Aho-Corasick-string-matching-in-C

Upvotes: 2

user1182183
user1182183

Reputation:

What you want is recursively iterate files in a directory (and maybe it's subdirectories).

So your steps would be to loop eeach file in the specified directory with Getfiles() from .NET. then if you encounter a directory loop it again.

This can be easily done with this code sample:

  public static IEnumerable<string>  GetFiles(string path)
  {
        foreach (string s in Directory.GetFiles(path, "*.extension_here"))
        {
              yield return s;
        }


        foreach (string s in Directory.GetDirectories(path))
        {
              foreach (string s1 in GetFiles(s))
              {
                    yield return s1;
              }
        }
  }

A more indepth look on iterating throug files in directories in .NET is located here:

http://blogs.msdn.com/b/brada/archive/2004/03/04/84069.aspx

Then you use the IndexOf method from String to check if your keywords are in the file (I discourage the use of ReadAllText, if your file is 5 MB big, your string will be too. Line-by-line will be less memory-hungry)

Upvotes: 5

Related Questions