Reputation: 5594
I have a Folder which has multiple sub folders. Each sub folder has many .dot and .txt files in them.
Is there a simple solution in C# .NET that will iterate through each file and check the contents of that file for a key phrase or keyword?
Document Name Keyword1 Keyword2 Keyword3 ...
test.dot Y N Y
To summarise:
Thanks in advance
Upvotes: 3
Views: 2966
Reputation: 2361
Here's a way using Tim's original answer to get the line number:
var keyWords = new[] { "Keyword1", "Keyword2", "Keyword3" };
var allDotFiles = Directory.EnumerateFiles(folder, "*.dot", SearchOption.AllDirectories);
var allTxtFiles = Directory.EnumerateFiles(folder, "*.txt", SearchOption.AllDirectories);
var allFiles = allDotFiles.Concat(allTxtFiles);
var allMatches = from fn in allFiles
from line in File.ReadLines(fn).Select((item, index) => new { LineNumber = index, Line = item})
from kw in keyWords
where line.Line.Contains(kw)
select new
{
File = fn,
Line = line.Line,
LineNumber = line.LineNumber,
Keyword = kw
};
foreach (var matchInfo in allMatches)
Console.WriteLine("File => {0} Line => {1} Keyword => {2} Line Number => {3}"
, matchInfo.File, matchInfo.Line, matchInfo.Keyword, matchInfo.LineNumber);
Upvotes: 0
Reputation: 460108
You can use Directory.EnumerateFiles
with a searchpattern and the recursive hint(SearchOption.AllDirectories
). The rest is easy with LINQ:
var keyWords = new []{"Y","N","Y"};
var allDotFiles = Directory.EnumerateFiles(folder, "*.dot", SearchOption.AllDirectories);
var allTxtFiles = Directory.EnumerateFiles(folder, "*.txt", SearchOption.AllDirectories);
var allFiles = allDotFiles.Concat(allTxtFiles);
var allMatches = from fn in allFiles
from line in File.ReadLines(fn)
from kw in keyWords
where line.Contains(kw)
select new {
File = fn,
Line = line,
Keyword = kw
};
foreach (var matchInfo in allMatches)
Console.WriteLine("File => {0} Line => {1} Keyword => {2}"
, matchInfo.File, matchInfo.Line, matchInfo.Keyword);
Note that you need to add using System.Linq;
Is there a way just to get the line number?
If you just want the line numbers you can use this query:
var matches = allFiles.Select(fn => new
{
File = fn,
LineIndices = String.Join(",",
File.ReadLines(fn)
.Select((l,i) => new {Line=l, Index =i})
.Where(x => keyWords.Any(w => x.Line.Contains(w)))
.Select(x => x.Index)),
})
.Where(x => x.LineIndices.Any());
foreach (var match in matches)
Console.WriteLine("File => {0} Linenumber => {1}"
, match.File, match.LineIndices);
It's a little bit more difficult since LINQ's query syntax doesn't allow to pass the index.
Upvotes: 3
Reputation: 14467
The first step: locate all files. It is easily done with System.IO.Directory.GetFiles() + System.IO.File.ReadAllText(), as others have mentioned.
The second step: find keywords in a file. This is simple if you have one keyword and it can be done with IndexOf() method, but iterating a file multiple times (especially if it is big) is a waste.
To quickly find multiple keywords in a text I think you should use the Aho-Corasick automaton (algorithm). See the C# implementation at CodeProject: http://www.codeproject.com/Articles/12383/Aho-Corasick-string-matching-in-C
Upvotes: 2
Reputation:
What you want is recursively iterate files in a directory (and maybe it's subdirectories).
So your steps would be to loop eeach file in the specified directory with Getfiles() from .NET. then if you encounter a directory loop it again.
This can be easily done with this code sample:
public static IEnumerable<string> GetFiles(string path)
{
foreach (string s in Directory.GetFiles(path, "*.extension_here"))
{
yield return s;
}
foreach (string s in Directory.GetDirectories(path))
{
foreach (string s1 in GetFiles(s))
{
yield return s1;
}
}
}
A more indepth look on iterating throug files in directories in .NET is located here:
http://blogs.msdn.com/b/brada/archive/2004/03/04/84069.aspx
Then you use the IndexOf method from String to check if your keywords are in the file (I discourage the use of ReadAllText, if your file is 5 MB big, your string will be too. Line-by-line will be less memory-hungry)
Upvotes: 5