Reputation: 525
The question would have been simple but an extra clause added to it has proved to be a big headache for me. The catch here is that I do not need all highlighted "words" but "phrases" from the Word file. I have written the following code:
using Word = Microsoft.Office.Interop.Word;
private void button1_Click(object sender, EventArgs e)
{
try
{
Word.ApplicationClass wordObject = new Word.ApplicationClass();
wordObject.Visible = false;
object file = "D:\\mywordfile.docx";
object nullobject = System.Reflection.Missing.Value;
Word.Document thisDoc = wordObject.Documents.Open(ref file, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject);
List<string> wordHighlights = new List<string>();
//Let myRange be some Range which has my text under consideration
int prevStart = 0;
int prevEnd = 0;
int thisStart = 0;
int thisEnd = 0;
string tempStr = "";
foreach (Word.Range cellWordRange in myRange.Words)
{
if (cellWordRange.HighlightColorIndex.ToString() == "wdNoHighlight")
{
continue;
}
else
{
thisStart = cellWordRange.Start;
thisEnd = cellWordRange.End;
string cellWordText = cellWordRange.Text.Trim();
if (cellWordText.Length >= 1) // valid word length, non-whitespace
{
if (thisStart == prevEnd) // If this word is contiguously highlighted with previous highlighted word
{
tempStr = String.Concat(tempStr, " "+cellWordText); // Concatenate with previous contiguously highlighted word
}
else
{
if (tempStr.Length > 0) // If some string has been concatenated in previous iterations
{
wordHighlights.Add(tempStr);
}
tempStr = "";
tempStr = cellWordText;
}
}
prevStart = thisStart;
prevEnd = thisEnd;
}
}
foreach (string highlightedString in wordHighlights)
{
MessageBox.Show(highlightedString);
}
}
catch (Exception j)
{
MessageBox.Show(j.Message);
}
}
Now consider the following text:
Le thé vert a un rôle dans la diminution du cholestérol, la combustion des graisses, la prévention du diabète et les AVC, et conjurer la démence.
Now suppose someone highlighted "du cholestérol", my code obviously selects two words "du" and "cholestérol". How can I make a continuously highlighted area appear as a single word? I mean "du cholestérol" should be returned as one entity in the List
. Any logic that we scan the document char by char, mark the starting point of highlighting as starting point of selection, and the endpoint of highlighting as end point of selection?
P.S.: If there is a library with required capabilities in any other language, please let me know as the scenario is not language specific. I need only to get the desired results somehow.
EDIT: Modified the code with Start
and End
as suggested by Oliver Hanappi. But the problem still lies that if there are two such highlighted phrases, separated only by a white space, the program considers both phrases as one. Simply because it reads the Words
and not spaces. May be some edits required around if (thisStart == prevEnd)
?
Upvotes: 3
Views: 6416
Reputation: 2291
grahamj42 answer is ok, i've translated it to C#. If you want to find matches in the whole document use:
Word.Range content = thisDoc.Content
But remember that this is only mainStoryRange, if you want to match words in, for example footnotes you need to use:
Word.StoryRanges stories = null;
stories = thisDoc.StoryRanges;
Word.Range footnoteRange = stories[Word.WdStoryType.wdFootnotesStory]
My code:
Word.Find find = null;
Word.Range duplicate = null;
try
{
duplicate = range.Duplicate;
find = duplicate.Find;
find.Highlight = 1;
object str = "";
object missing = System.Type.Missing;
object objTrue = true;
object replace = Word.WdReplace.wdReplaceNone;
bool result = find.Execute(ref str, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref objTrue, ref str, ref replace, ref missing, ref missing, ref missing, ref missing);
while (result)
{
// code to store range text
// use duplicate.Text property
result = find.Execute(ref str, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref objTrue, ref str, ref replace, ref missing, ref missing, ref missing, ref missing);
}
}
finally
{
if (find != null) Marshal.ReleaseComObject(find);
if (duplicate != null) Marshal.ReleaseComObject(duplicate);
}
Upvotes: 0
Reputation: 525
I started with Oliver's logic, things seemed to be fine, but testing revealed that this method does not take into account white spaces. So highlighted phrases separated by just a space were not getting separated. I used the VB code approach provided by grahamj42 and added it as a class library and included the reference in my C# windows forms project.
My C# Windows form project:
using Word = Microsoft.Office.Interop.Word;
and then I changed the try
block as:
Word.ApplicationClass wordObject = new Word.ApplicationClass();
wordObject.Visible = false;
object file = "D:\\mywordfile.docx";
object nullobject = System.Reflection.Missing.Value;
Word.Document thisDoc = wordObject.Documents.Open(ref file, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject, ref nullobject);
List<string> wordHighlights = new List<string>();
// Let myRange be some Range, which has been already selected programatically here
WordMacroClasses.Highlighting macroObj = new WordMacroClasses.Highlighting();
List<string> hiWords = macroObj.HighlightRange(myRange, myRange.End);
foreach (string hitext in hiWords)
{
wordHighlights.Add(hitext);
}
And here is the Range.Find
code in VB class library which simply accepts the Range
and its Range.Last
and returns a List(Of String)
:
Public Class Highlighting
Public Function HighlightRange(ByVal myRange As Microsoft.Office.Interop.Word.Range, ByVal rangeLimit As Integer) As List(Of String)
Dim Highlights As New List(Of String)
Dim i As Integer
i = 0
With myRange.Find
.Highlight = True
Do While .Execute = True ' loop while highlighted text is found
If (myRange.Start < rangeLimit) Then Highlights.Add(myRange.Text)
Loop
End With
Return Highlights
End Function
End Class
Upvotes: -1
Reputation: 2762
You can do this far more efficiently with Find which will search more quickly and select all the contiguous text which matches. See the reference here http://msdn.microsoft.com/en-us/library/office/bb258967%28v=office.12%29.aspx
Here is an example in VBA which prints all occurrences of highlighted text :
Sub TestFind()
Dim myRange As Range
Set myRange = ActiveDocument.Content ' search entire document
With myRange.Find
.Highlight = True
Do While .Execute = True ' loop while highlighted text is found
Debug.Print myRange.Text ' myRange is changed to contain the found text
Loop
End With
End Sub
Hope this helps you understand better.
Upvotes: 2
Reputation: 12346
You can look at the Start and End properties of the ranges and check whether the end of the first range equals the start of the second.
As an alternative, you may move the range by one word (see WdUnits.wdWord) and then check if the moved start and end equals the start and end of the second word.
Upvotes: 1