Java String.split() spiralling out of control

Question

I am trying to parse strings (some can be very long, paragraphs) based on white space (spaces, return keys, tabs). Currently using String.split("\s++"). In the previous project we are updating, we had simply used StringTokenizer. Using String.split("\s++") works just fine in all our testing and with all our beta testers.

The minute we release it to expanded users, it runs for a while until it soaks up all server resources. From what I've researched, it appears to be catastrophic backtracking. We get errors like:

    ....was in progress with java.base@11.0.5/java.util.regex.Pattern$GroupHead.match(Pattern.java:4804)
    java.base@11.0.5/java.util.regex.Pattern$Start.match(Pattern.java:3619)
    java.base@11.0.5/java.util.regex.Matcher.search(Matcher.java:1729)
    java.base@11.0.5/java.util.regex.Matcher.find(Matcher.java:746)
    java.base@11.0.5/java.util.regex.Pattern.split(Pattern.java:1264)
    java.base@11.0.5/java.lang.String.split(String.java:2317)

Users can type some crazy text. What is the best option to parse strings that could be anywhere from 10 characters to 1000 characters long? I am at a brick wall. Been trying different patterns (regex is not my strongest area) for the past 4 days without long term success.

Java String.split() spiralling out of control

Answers (1)

Related Questions