Reputation: 9398
I am trying to search for a particular occurrence of a string in some files belonging to a directory. (The search is also performed in the sub directories. Currently, I came up with a solution something like this.
Continue this till the last file.
string[] fileNames = Directory.GetFiles(@"d:\test", "*.txt", SearchOption.AllDirectories);
foreach (string sTem in fileNames)
{
foreach (string line in File.ReadAllLines(sTem))
{
if (line.Contains(SearchString))
{
MessageBox.Show("Found search string!");
break;
}
}
}
I think there can be other methods/approach efficient and speeder than this? Using a batch file? OK. Another, solution is to use findstr (but how to use it directly with C# program without a batch file ? What is the most efficient (or more efficient than what I did?) Code examples are much appreciated!
Found out another solution.
Process myproc = new Process();
myproc.StartInfo.FileName = "findstr";
myproc.StartInfo.Arguments = "/m /s /d:\"c:\\REQs\" \"madhuresh\" *.req";
myproc.StartInfo.RedirectStandardOutput = true;
myproc.StartInfo.UseShellExecute = false;
myproc.Start();
string output = myproc.StandardOutput.ReadToEnd();
myproc.WaitForExit();
Is this execution of a process good ? Comments on this too are welcome!
According to the @AbitChev's method, a sleek (I don't know if it's efficient!). Anyways, it goes on like this. This one searches all the directory as well as the subdirectories!
IEnumerable<string> s = from file in Directory.EnumerateFiles("c:\\directorypath", "*.req", SearchOption.AllDirectories)
from str in File.ReadLines(file)
//where str.Contains("Text@tosearched2")
where str.IndexOf(sSearchItem, StringComparison.OrdinalIgnoreCase) >= 0
select file;
foreach (string sa in s)
MessageBox.Show(sa);
(for case-insensitive search. Maybe that could help someone.) Please comment! Thanks.
Upvotes: 1
Views: 1133
Reputation: 11
This works well. I searched around 500 terms over 230 files in under .5 milliseconds. This is very memory intensive; it loads every file into memory
public class FindInDirectory
{
public class Match
{
public string Pattern { get; set; }
public string Directory { get; set; }
public MatchCollection Matches { get; set; }
}
public static List<FindInDirectory.Match> Search(string directory, string searchPattern, List<string> patterns)
{
//find all file locations
IEnumerable<string> files = System.IO.Directory.EnumerateFiles(directory, searchPattern, System.IO.SearchOption.AllDirectories);
//load all text into memory for MULTI-PATERN
//this greatly increases speed, but it requires a ton of memory!
Dictionary<string, string> contents = files.ToDictionary(f => f, f => System.IO.File.ReadAllText(f));
List<FindInDirectory.Match> directoryMatches = new List<Match>();
foreach (string pattern in patterns)
{
directoryMatches.AddRange
(
contents.Select(c => new Match
{
Pattern = pattern,
Directory = c.Key,
Matches = Regex.Matches(c.Value, pattern, RegexOptions.IgnoreCase | RegexOptions.Multiline)
})
.Where(c => c.Matches.Count > 0)//switch to > 1 when program directory is same or child of search
);
};
return directoryMatches;
}
}
USE:
static void Main(string[] args)
{
List<string> patterns = new List<string>
{
"class",
"foreach",
"main",
};
string searchPattern = "*.cs";
string directory = "C:\\SearchDirectory";
DateTime start = DateTime.UtcNow;
FindInDirectory.Search(directory, searchPattern, patterns);
Console.WriteLine((DateTime.UtcNow - start).TotalMilliseconds);
Console.ReadLine();
}
Upvotes: 1
Reputation: 100248
Use Directory.EnumerateFiles()
and File.ReadLines()
- both provides lazy loading of data:
from file in Directory.EnumerateFiles(path)
from arr in File.ReadLines(file)
from str in arr
where str.Contains(pattern)
select new
{
FileName = file, // file containing matched string
Line = str // matched string
};
or
foreach (var file in Directory.EnumerateFiles(path).AsParallel())
{
try
{
foreach (var arr in File.ReadLines(file).AsParallel())
{
// one more try here?
foreach (var str in arr)
{
if (str.Contains(pattern))
{
yield return new
{
FileName = file, // file containing matched string
Line = str // matched string
};
}
}
}
}
catch (SecurityException)
{
// swallow or log
}
}
Upvotes: 3
Reputation: 35696
How about somthing like this
var found = false;
string file;
foreach (file in Directory.EnumerateFiles(
"d:\\tes\\",
"*.txt",
SearchOption.AllDirectories))
{
foreach(var line in File.ReadLines(file))
{
if (line.Contains(searchString))
{
found = ture;
break;
}
}
if (found)
{
break;
}
}
if (found)
{
var message = string.Format("Search string found in \"{0}\".", file)
MessageBox.Show(file);
}
This has the advantage of loading only what is required into memory, rather than the names of all the files then, the contents of each file.
I note you are using String.Contains
which
performs an ordinal (case-sensitive and culture-insensitive) comparison
This would allow us to do a simple charachter wise compare.
I'd start with a little helper function
private static bool CompareCharBuffers(
char[] buffer,
int headPosition,
char[] stringChars)
{
// null checking and length comparison ommitted
var same = true;
var bufferPos = headPosition;
for (var i = 0; i < stringChars.Length; i++)
{
if (!stringChars[i].Equals(buffer[bufferPos]))
{
same = false;
break;
}
bufferPos = ++bufferPos % (buffer.Length - 1);
}
return same;
}
Then I'd alter the previous algorithm to use the function like this.
var stringChars = searchString.ToCharArray();
var found = false;
string file;
foreach (file in Directory.EnumerateFiles(
"d:\\tes\\",
"*.txt",
SearchOption.AllDirectories))
{
using (var reader = File.OpenText(file))
{
var buffer = new char[stringChars.Length];
if (reader.ReadBlock(buffer, 0, buffer.Length - 1)
< stringChars.Length - 1)
{
continue;
}
var head = 0;
var nextPos = buffer.Length - 1;
var nextChar = reader.Read();
while (nextChar != -1)
{
buffer[nextPos] = (char)nextChar;
if (CompareCharBuffers(buffer, head, stringChars))
{
found = ture;
break;
}
head = ++head % (buffer.Length - 1);
if (head == 0)
{
nextPos = buffer.Length - 1;
}
else
{
nextPos = head - 1;
}
nextChar = reader.Read();
}
if (found)
{
break;
}
}
}
if (found)
{
var message = string.Format("Search string found in \"{0}\".", file)
MessageBox.Show(file);
}
this holds only as many char
s as the search string contains in memory and uses rolling buffer across each file. Theoretically the file could contain no new lines and consume your whole disk, or, your search string could contain a new line.
As further work I'd convert the per file part of the algorithm into a function and investigate a multi-threaded approach.
So this would be the internal function,
static bool FileContains(string file, char[] stringChars)
{
using (var reader = File.OpenText(file))
{
var buffer = new char[stringChars.Length];
if (reader.ReadBlock(buffer, 0, buffer.Length - 1)
< stringChars.Length - 1)
{
return false;
}
var head = 0;
var nextPos = buffer.Length - 1;
var nextChar = reader.Read();
while (nextChar != -1)
{
buffer[nextPos] = (char)nextChar;
if (CompareCharBuffers(buffer, head, stringChars))
{
return true;
}
head = ++head % (buffer.Length - 1);
if (head == 0)
{
nextPos = buffer.Length - 1;
}
else
{
nextPos = head - 1;
}
nextChar = reader.Read();
}
return false;
}
}
Then you could process the files in parallel like this
var stringChars = searchString.ToCharArray();
if (Directory.EnumerateFiles(
"d:\\tes\\",
"*.txt",
SearchOption.AllDirectories)
.AsParallel()
.Any(file => FileContains(file, stringChars)))
{
MessageBox.Show("Found search string!");
}
Upvotes: 2
Reputation: 3636
You can create a "Pipeline" with Tasks.Dataflow
(this .dll isn't currently part of .NET 4.5, but you can download it from here) to consume all files and searching for explicit strings. Take a look at this Reference Implementation.
Upvotes: 0