Karthik Ravichandran
Karthik Ravichandran

Reputation: 1283

Doc and docx filetypes returns the same file

I am trying to get the document files in the below mentioned path. so i set the file types as doc and docx.

In the documents folder, i have the file "issues.docx"

 string path1 = "C:\\Documents\\";
 var dir =  new DirectoryInfo(path1);
 string[] fileTypes = { "doc", "docx" };       
 var myFiles = fileTypes.SelectMany(dir.GetFiles);

This is the code i have used in my application. here it returns the issues.docx file two times. But it should return the files one time only.How can i achieve it without any change in fileTypes?

Upvotes: 0

Views: 454

Answers (1)

Panagiotis Kanavos
Panagiotis Kanavos

Reputation: 131180

NTFS isn't a search engine. Performing two separate searches will result in two scans of all the files, taking double the time.

It would be faster if you used EnumerateFiles to search for .doc files and split them by extension afterwards, eg with ToDictionary.

var filesByExtension=dir.EnumerateFiles("*.doc?")
                        .ToDictionary(fi=>fi.Extension,fi=>fi);

You can also group the results, if you want, eg to calculate statistics:

dir.EnumerateFiles("*.doc?")
   .GroupBy(fi=>fi.Extension)
   .Select(g=>new {
                      Extension=g.Key,
                      TotalSize=g.Sum(f=>f.Length), 
                      Files=g.ToArray()
           });

If you want accelerated searching, you can use the Windows Search service. Calling it isn't straightforward though, you have to call it as if it were an OLEDB database. The results may not be accurate either, if the indexer is still scanning the files

UPDATE

If there is nothing common in the file types to search, filtering can be performed in a Where expression:

var extensions=new[]{".doc",".docx",".png",".jpg"};
dir.EnumerateFiles()
   .Where(fi=>extensions.Contains(fi.Extension))
   .GroupBy(fi=>fi.Extension)
   .Select(g=>new {
                      Extension=g.Key,
                      TotalSize=g.Sum(f=>f.Length), 
                      Files=g.ToArray()
           });

Where can be used to filter out small or large files, eg:

var extensions=new[]{".doc",".docx",".png",".jpg"};
dir.EnumerateFiles()
   .Where(fi=>extensions.Contains(fi.Extension) && fi.Length>1024)
   .GroupBy(fi=>fi.Extension)
   .Select(g=>new {
                      Extension=g.Key,
                      TotalSize=g.Sum(f=>f.Length), 
                      Files=g.ToArray()
           });

Upvotes: 2

Related Questions