seebiscuit
seebiscuit

Reputation: 5053

IEnumerable implementation breaks on foreach

I'm using the PDFNet library to extract objects from a PDF and then OCR. I instantiate my Elements object:

public class Processor
{
    public static int Main(string[] args)
    {
       Elements pdfPageElements = new Elements(pdfPage);
       ...

The constructor (in a separate class) looks like

internal class Elements : IEnumerator<Element>, IEnumerable<Element>
{
    private readonly int _position;
    private readonly ElementReader _pdfElements;
    private Element _current;

    public Elements(Page currentPage)
    {
        _pdfElements = new ElementReader();
        _pdfElements.Begin(currentPage);
        _position = 0;
    }

    ...

After instantiating pdfPageElements I go back to Main() and use Linq to iterate through the collection items to get the PDF objects (in this case images) that I want.

var pdfPageImages = (from e in pdfPageElements
                     where
                         (e.GetType() == Element.Type.e_inline_image ||
                          e.GetType() == Element.Type.e_image)
                     select e);

The PDFNet SDK implements the MoveNext() Method as follows:

public bool MoveNext()
{
   if ((_current = _pdfElements.Next()) != null)
    {
        return true;
    }
    else
    {
        _pdfElements.Dispose();
         return false;
     }
 }

pdfPageImages is instatiatied nicely; Console.WriteLine(pdfPageImages.Count()); returns the right number of images for my test PDF.

But when I send pdfPageImages through a foreach loop I get the following exception:

pdftron.Common.PDFNetException: Unknown exception.
 at pdftron.PDF.ElementReader.Next()
 at pdftron.Elements.MoveNext()
 at System.Linq.Enumerable.WhereEnumerableIterator`1.MoveNext()
 at DM_PDFProcessor.Processor.Main(String[] args)

It's probably worthwhile to note that int he PDFNet Documentation it states that:

Every call to ElementReader::Next() destroys the current Element. 
Therefore, an Element becomes invalid after subsequent 
ElementReader::Next() operation.

However, once the element is read into the IEnumerable pdfPageImages, it should be iterable indefinitely (from my limited understanding).


Note that the elements in the collection are definitely not null. Any ideas why I keep getting the exception?

Upvotes: 1

Views: 200

Answers (1)

alsed42
alsed42

Reputation: 1216

Note that

var pdfPageImages = (from e in pdfPageElements
                     where
                         (e.GetType() == Element.Type.e_inline_image ||
                          e.GetType() == Element.Type.e_image)
                     select e);

is lazily evaluated. That is, every time pdfPageImages is enumerated, pdfPageElements is also enumerated. So if the Elements class is built so that an instance can only be enumerated once without throwing, you might want to cache the query result:

var pdfPageImages = (from e in pdfPageElements
                     where
                         (e.GetType() == Element.Type.e_inline_image ||
                          e.GetType() == Element.Type.e_image)
                     select e).ToList();

Upvotes: 3

Related Questions