Reputation: 3
I have a .txt input file as follows:
Start "String" (100, 100) Test One:
Nextline 10;
Test Second Third(2, 4, 2, 4):
String "7";
String "8";
Test "";
End;
End.
I've intended to read this file in as one String and then split it based on certain delimiters. I've almost met the desired output with this code:
String tr= entireFile.replaceAll("\\s+", "");
String[] input = tr.split("(?<=[(,):;.])|(?=[(,):;.])|(?=\\p{Upper})");
My current output is:
Start"
String"
(
100
,
100
)
Test
One
:
Nextline10
;
Test
Second
Third
(
2
,
4
,
2
,
4
)
:
String"7"
;
String"8"
;
Test""
;
End
;
End
.
However, I'm having trouble treating items inside quotes or just plain quotes "" as a separate token. So "String" and "7" and "" should all be on separate lines. Is there a way to do this with regex? My expected output is below, thanks for any help.
Start
"String"
(
100
,
100
)
Test
One
:
Nextline
10
;
Test
Second
Third
(
2
,
4
,
2
,
4
)
:
String
"7"
;
String
"8"
;
Test
""
;
End
;
End
.
Upvotes: 0
Views: 277
Reputation: 76
Here's the regex I came up with:
String[] input = entireFile.split(
"\\s+|" + // Splits on whitespace or
"(?<=\\()|" + // splits on the positive lookbehind ( or
"(?=[,).:;])|" + // splits on any of the positive lookaheads ,).:; or
"((?<!\\s)(?=\\())"); // splits on the positive lookahead ( with a negative lookbehind whitespace
To understand all that positive/negative lookahead/lookbehind terminology, take a look at this answer.
Note that you should apply this split directly to the input file without removing whitespace, aka take out this line:
String tr= entireFile.replaceAll("\\s+", "");
Upvotes: 4