Delanoy
Delanoy

Reputation: 41

looping for PDF files

My program goes into a directory and searches for pdf files to parse them. This program is always running so I need to make sure to don't parse the same file over again.

I used a list to store the file names and then check if they are in there.

My code does not work in respect to that, if anyone can take a look and see whats wrong it would be greatly appreciated.

FileInfo[] filePaths = di.GetFiles("*.pdf");
for (int i = 0; i < filePaths.Length; i++)
{
    foreach (string fileName in usedFileNames)
    {
        if (fileName.Equals(filePaths[i].Name))
        {
            isInList = true;
        }
        else
        {
            isInList = false;
        }
    }
    if (isInList == false)
    {
        PDFReaderChooser chooser = new PDFReaderChooser(filePaths[i].Name);
        usedFileNames.Add(filePaths[i].Name);
    }

}

Upvotes: 2

Views: 375

Answers (4)

as-cii
as-cii

Reputation: 13019

Try this:

FileInfo[] filePaths = di.GetFiles("*.pdf");
foreach(FileInfo fInfo in filePaths)
{
    if (!usedFileNames.Contains(fInfo.Name))
    {
        PDFReaderChooser chooser = new PDFReaderChooser(fInfo.Name);
        usedFileNames.Add(fInfo.Name);
    }
}

As I commented on your question, the code you posted doesn't work because you have to insert a break statement, like this:

for (int i = 0; i < filePaths.Length; i++)
{
    bool isInList = false;

    foreach (string fileName in usedFileNames)
    {
        if (fileName.Equals(filePaths[i].Name))
            isInList = true;
    }

    if (isInList == false)
    {
        Console.WriteLine("Not in list! #{0}", x);
        usedFileNames.Add(filePaths[i].Name);
    }
}

Anyway I recommend you to use one of the techniques shown in this question replies.

Upvotes: 0

phoog
phoog

Reputation: 43036

While the other answers are better solutions to the problem, they don't explain why the original code didn't work. The problem is that the algorithm overwrites the value of the isInList variable, which will therefore only be true for the last file in the list. This would fix that problem:

FileInfo[] filePaths = di.GetFiles("*.pdf"); 
for (int i = 0; i < filePaths.Length; i++) 
{ 
    isInList = false
    foreach (string fileName in usedFileNames) 
    { 
        if (fileName.Equals(filePaths[i].Name)) 
        { 
            isInList = true;
            break;
        } 
    } 
    if (isInList == false) 
    { 
        PDFReaderChooser chooser = new PDFReaderChooser(filePaths[i].Name); 
        usedFileNames.Add(filePaths[i].Name); 
    } 
} 

I would add that it is better to use a HashSet instead of a List for your usedFileNames collection. The hash set is designed to determine efficiently whether it contains a given item. The list, if I recall correctly, does a linear search, which (for large numbers of items) is inefficient.

Upvotes: 3

Domenic
Domenic

Reputation: 112807

More concise still:

var fileNames = di.GetFiles("*.pdf")
                  .Select(f => f.Name)
                  .Where(n => !usedFileNames.Contains(n));
usedFileNames.AddRange(fileNames);

foreach (var fileName in fileNames)
{
    var chooser = new PDFReaderChooser(fileName);
}

This nicely abstracts away the logic that figures out which file names you need to process (outside the loop), from the logic that processes them (inside the loop).

Upvotes: 4

Matthew Jones
Matthew Jones

Reputation: 26190

A LINQ Contains operation would make this much more concise (assuming usedFileNames is a List<string>):

FileInfo[] filePaths = di.GetFiles("*.pdf");
foreach(FileInfo myInfo in filePaths)
{
    if (!usedFileNames.Contains(myInfo.Name))
    {
        PDFReaderChooser chooser = new PDFReaderChooser(myInfo.Name);
        usedFileNames.Add(myInfo.Name);
    }

}

Upvotes: 0

Related Questions