user1797147
user1797147

Reputation: 937

c# itextsharp, locate words not chunks in page with their location for adding sticky notes

I already read all related StackOverflow and haven't find a decent solution to this. I want to open a PDF, get the text (words) and their coordinates then further, add a sticky note to some of them.

Seems to be mission impossible, I'm stucked.

How come this code will correctly find all words in a page (but not their coordinates)?

    using (PdfReader reader = new PdfReader(path))
    {
        StringBuilder sb = new StringBuilder();

        ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
        for (int page = 5; page <= 5; page++)
        {
            string text = PdfTextExtractor.GetTextFromPage(reader, page, strategy);

            Console.WriteLine(text);

        }

        //txt = sb.ToString();

    }

But this one gets coordinates, but for "chunks" that cannot rely they are in proper order.

    PdfReader reader = new PdfReader(path);
    PdfReaderContentParser parser = new PdfReaderContentParser(reader);

    LocationTextExtractionStrategyEx strategy;

    for (int i = 5; i <= 5; i++) // reader.NumberOfPages
    {
        //strategy = parser.ProcessContent(i, new SimpleTextExtractionStrategy());
        // new MyLocationTextExtractionStrategy("sample", System.Globalization.CompareOptions.None)
        strategy = parser.ProcessContent(i, new LocationTextExtractionStrategyEx("MCU_MOSI", 0));

        foreach (LocationTextExtractionStrategyEx.ExtendedTextChunk chunk in strategy.m_DocChunks)
        {
            if (chunk.m_text.Trim() == "MCU_MOSI")
                Console.WriteLine("Bingo");  // <-- NEVER HIT
        }


        //Console.WriteLine(strategy.m_SearchResultsList.ToString()); // strategy.GetResultantText() + 



    }   

This uses a class from this post (little modified by me) Getting Coordinates of string using ITextExtractionStrategy and LocationTextExtractionStrategy in Itextsharp

But only finds useless "chunks".

So the question is can with iTextSharp really locate words in page so I can add some sticky notes nearby? Thank you.

Upvotes: 1

Views: 1006

Answers (1)

ktyson
ktyson

Reputation: 84

It looks like the chunk.m_text only contains one letter at a time which is why it this will never be true:

if (chunk.m_text.Trim() == "MCU_MOSI")

What you could do instead is have each chunk text added to a string and see if it contains your text.

    PdfReader reader = new PdfReader(path);
    PdfReaderContentParser parser = new PdfReaderContentParser(reader);

    LocationTextExtractionStrategyEx strategy;
    string str = string.Empty;

    for (int i = 5; i <= 5; i++) // reader.NumberOfPages
    {
        strategy = parser.ProcessContent(i, new LocationTextExtractionStrategyEx("MCU_MOSI", 0));
        var x = strategy.m_SearchResultsList;
        foreach (LocationTextExtractionStrategyEx.ExtendedTextChunk chunk in strategy.m_DocChunks)
        {
            str += chunk.m_text;
            if (str.Contains("MCU_MOSI"))
            {
                str = string.Empty;
                Vector location = chunk.m_endLocation;
                Console.WriteLine("Bingo"); 
            }                        
        }
    }

Note for the example of the location, I made m_endLocation public.

Upvotes: 1

Related Questions