Reputation: 2100
I need to search through a word document for a string, and return the "offset" of the first character. What I am unsure about is how to account for newlines. If the document consists of:
Hi
World.
What is the offset of 'W' - is it 2, since the offset of 'i' is 1? Or is it 3, because the hidden '\n' could be considered a character? What if the document is using '\r\n' carriage returns? Is there a standard way to deal with this (Java)?
Upvotes: 1
Views: 1651
Reputation: 4066
\r
and \n
are character too and increase indexes like other characters, the offset of W
is 3 when only \n
are used
if you want to be sure of newline chars remove all \r
from your text before process
Upvotes: 0
Reputation: 16037
I think in the first place you should consult on this question with the one who originally defined the task: return the "offset" of the first character. Because it all depends on how do you intend to further use the offset value.
Me on the other hand would count all "special" characters, that is I would count \r and \n as well.
Upvotes: 0
Reputation: 516
The answer is normalization:
test.replaceAll("\r", "").indexOf('W')
3
Upvotes: 3