user1002973
user1002973

Reputation: 2100

How to find the offset of a character in a multi-line document

I need to search through a word document for a string, and return the "offset" of the first character. What I am unsure about is how to account for newlines. If the document consists of:

Hi

World.

What is the offset of 'W' - is it 2, since the offset of 'i' is 1? Or is it 3, because the hidden '\n' could be considered a character? What if the document is using '\r\n' carriage returns? Is there a standard way to deal with this (Java)?

Upvotes: 1

Views: 1651

Answers (3)

Farnabaz
Farnabaz

Reputation: 4066

\r and \n are character too and increase indexes like other characters, the offset of W is 3 when only \n are used
if you want to be sure of newline chars remove all \r from your text before process

Upvotes: 0

bpgergo
bpgergo

Reputation: 16037

I think in the first place you should consult on this question with the one who originally defined the task: return the "offset" of the first character. Because it all depends on how do you intend to further use the offset value.

Me on the other hand would count all "special" characters, that is I would count \r and \n as well.

Upvotes: 0

Maciek
Maciek

Reputation: 516

The answer is normalization:

test.replaceAll("\r", "").indexOf('W')
3

Upvotes: 3

Related Questions