Reputation: 470
My input string contains mixed type of line separators including '\r\n', '\r' or '\n'. I want to split the string and keep the line separator with the substring that precedes it. I followed two postings below
How to split a string, but also keep the delimiters?
and come up with something like:
String input = "1 dog \r\n 2 cat";
String[] output = input.split( "(?<=((\\r\\n)|\\r|\\n))")));
the output is ["1 dog\r", "\n", " 2 cat"]
, however the desired output is ["1 dog\r\n", " 2 cat"]
.
If I change the input to either String input = "1 dog \r 2 cat";
or String input = "1 dog \n 2 cat";
, my code can produce desired output. Please advise. Thanks in advance.
Upvotes: 4
Views: 2223
Reputation: 163362
You get your result ["1 dog\r", "\n", " 2 cat"]
because your pattern uses an alternation which will match either (\r\n)
or \r
or \n
.
When \r\n
is encountered in the example string, the lookbehind assertion will be true after \r
and will split for the first time.
Then the lookbehind assertion will be true after \n
and will split for the second time.
What you might do is use \R
in the positive lookbehind to assert what is on the left is a unicode newline sequence:
String input = "1 dog \r\n 2 cat";
String[] output = input.split("(?<=\\R)");
Another option to fix your regex is to make it an atomic group:
(?<=(?>\\r\\n|\\r|\\n))
Reading this post, when the \r
is matched in the lookbehind using an atomic group, the following \n
is also matched.
Upvotes: 2
Reputation: 12438
If you use the following regex:(?<=\\r\\n|\\r(?!\\n)|\\n)
to split your string it will work as intended.
What is happening with your regex is that when \r\n
is encountered, the lookbehind assertion will be true (?<=\r)
and it will split the string just after \r
.
This is why I have added a negative lookahead (?!\n)
after \r
to enforce that the character after \r
is not \n
. This will prevent the split between \r
and \n
and keep it as a whole.
Demo: https://regex101.com/r/H6PNmY/1/ (where I have replaced \r
by a
and \n
by b
for readability)
When you put this back in your code:
String input = "1 dog \r\n 2 cat, 1 car \r 2 planes, 1 apple \n 2 peaches";
String[] output = input.split("(?<=\\r\\n|\\r(?!\\n)|\\n)");
for(int i=0; i<output.length; i++)
{
printASCII(output[i]);
System.out.println("===");
}
with printASCII
defined as:
public static void printASCII(String in)
{
for(int i=0; i<in.length(); i++)
System.out.println("The ASCII value of " + in.charAt(i) + " = " + (int)in.charAt(i) );
}
It gives you the following output:
The ASCII value of 1 = 49
The ASCII value of = 32
The ASCII value of d = 100
The ASCII value of o = 111
The ASCII value of g = 103
The ASCII value of = 32
The ASCII value of
= 13
The ASCII value of
= 10
===
The ASCII value of = 32
The ASCII value of 2 = 50
The ASCII value of = 32
The ASCII value of c = 99
The ASCII value of a = 97
The ASCII value of t = 116
The ASCII value of , = 44
The ASCII value of = 32
The ASCII value of 1 = 49
The ASCII value of = 32
The ASCII value of c = 99
The ASCII value of a = 97
The ASCII value of r = 114
The ASCII value of = 32
The ASCII value of
= 13
===
The ASCII value of = 32
The ASCII value of 2 = 50
The ASCII value of = 32
The ASCII value of p = 112
The ASCII value of l = 108
The ASCII value of a = 97
The ASCII value of n = 110
The ASCII value of e = 101
The ASCII value of s = 115
The ASCII value of , = 44
The ASCII value of = 32
The ASCII value of 1 = 49
The ASCII value of = 32
The ASCII value of a = 97
The ASCII value of p = 112
The ASCII value of p = 112
The ASCII value of l = 108
The ASCII value of e = 101
The ASCII value of = 32
The ASCII value of
= 10
===
The ASCII value of = 32
The ASCII value of 2 = 50
The ASCII value of = 32
The ASCII value of p = 112
The ASCII value of e = 101
The ASCII value of a = 97
The ASCII value of c = 99
The ASCII value of h = 104
The ASCII value of e = 101
The ASCII value of s = 115
===
That shows that the EOL characters are properly kept as you have requested.
ASCII table: https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.networkcomm/conversion_table.htm
Upvotes: 1