Reputation: 1
How do i read this sentence and parse it using scanner to get the below output?
Input: "it is red i.e. RED. not read."
Output: it is red i.e. RED not read
i tried the below but it doesnt remove the periods at the end of the words:
Scanner lineReader = new Scanner(scanner.nextLine());
lineReader.useDelimiter(("\\s+(\\W*\\s)?"));
edit: let me change this requirement: how do i remove all punctuation marks from the input text but not when its a period (.) between two letters like i.e.
Upvotes: 0
Views: 4953
Reputation: 40753
"(?<!i\\.e)\\.? |\\.$"
should do the trick.
In English this regex says a delimiter is any of the following:
With regards to your edit, try "((?<=\\s\\w{1,10})[^\\w\\s])?\\s|[^\\w\\s]$"
[^\\w\\s]
means any character that is not a letter or a digit or whitespace (i.e. punctuation).
(?<=\\s\\w{1,10})[^\\w\\s])?\\s
means a space that may be preceded by punctuation if there is no other punctuation before the next previous space. That is, it will not match the .[space]
in e.g.[space]
because there is a full stop between the e and the g. The lookbehind ((?<=\\s\\w{1,10})
) is required to have a maximum length, and so may not use the zero-or-more or one-or-more operators (*
and +
). I put an arbitary limit of 10 because I don't know of any words or abbreviations that contain punctuation and are more than a few characters.
edit: I tested the new regex on the it is red i.e. RED. not read. e.g. 1,2, done!
and it produced:
Upvotes: 1