Quang
Quang

Reputation: 183

How to get the range of first paragraph on each page in Word Document using C# Word interlop

I have a word file with 9 pages.

I use this:

Microsoft.Office.Interop.Word.Application wordApp = new Microsoft.Office.Interop.Word.Application();
Microsoft.Office.Interop.Word.Document wordDoc = wordApp.Documents.Open(file);
Microsoft.Office.Interop.Word.Range docRange = wordDoc.Range();

But, this code will give me range of all paragraph.

How to get the range of text in fist line (or first paragraph) of each page using C# Word interlop?

Sorry about my english ...

Ex: At the first page i want to get text:

"Apple Inc. is an American multinational technology company headquartered in Cupertino, California,"

or first paragraph

"Apple Inc. is an American multinational technology company headquartered in Cupertino, California, that designs, develops, and sells consumer electronics, computer software, and online services. It is considered one of the Big Four technology companies, alongside Amazon, Google, and Microsoft."

enter image description here

The second page is:

the Text i want:

Apple was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne in April 1976 to develop and sell

or

Apple was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne in April 1976 to develop and sell Wozniak's Apple I personal computer, though Wayne sold his share back within 12 days.

enter image description here

Upvotes: 2

Views: 1225

Answers (1)

Tomas Paul
Tomas Paul

Reputation: 540

You can try iterate through all paragraphs and get page number. Then select the first paragraph of the page.

using Word = Microsoft.Office.Interop.Word;

private void FindFirstParagraphOfEachPage()
{
    Word.Application wordApp = new Word.Application();
    Word.Document wordDoc = wordApp.Documents.Open(filePath);
    Word.Range docRange = wordDoc.Range();

    var paragraphs = new List<Paragraph>();

    foreach (Word.Paragraph p in wordDoc.Paragraphs)
    {
        paragraphs.Add(new Paragraph()
        {
            PageNumber = (int)p.Range.get_Information(Word.WdInformation.wdActiveEndPageNumber),
            ParagraphText = p.Range.Text.ToString()
        });
    }
    var result = paragraphs.Where(x => !string.IsNullOrWhiteSpace(x.ParagraphText))
                        .GroupBy(x => x.PageNumber)
                        .Select(x => x.First());

    wordDoc.Close();
    wordApp.NormalTemplate.Saved = true;
    wordApp.Quit();
}

Helper class to store page number and paragraph text.

class Paragraph
{
    public int PageNumber { get; set; }
    public string ParagraphText { get; set; }
}

I am not sure about releasing the objects. It probably will require some edits and testing.

Upvotes: 1

Related Questions