Sam Meow
Sam Meow

Reputation: 211

Split on Regular Expression per Path

If I have this:

thisisgibberish  1234 /hello/world/
more gibberish 43/7 /good/timing/
just onemore    8888  /thanks/mate

what would the regular expression inside the Java String.split() method be to obtain the paths per line?

ie.

[0]: /hello/world/
[1]: /good/timing/
[2]: /thanks/mate

Doing

myString.split("\/[a-zA-Z]") 

causes the splits to occur to every /h, /w, /g, /t, and /m.

How would I go about writing a regular expression to split it only once per line while only capturing the paths?

Thanks in advance.

Upvotes: 2

Views: 100

Answers (3)

Bohemian
Bohemian

Reputation: 424983

You must first remove the leading junk, then split on the intervening junk:

String[] paths = str.replaceAll("^.*? (?=/[a-zA-Z])", "")
    .split("(?m)((?<=[a-zA-Z]/|[a-zA-Z])\\s|^).*? (?=/[a-zA-Z])");

One important point here is the use of (?m), which is a switch that turns on "dot matches newline", which is required to split across the newlines.

Here's some test code:

String str = "thisisgibberish  1234 /hello/world/\nmore gibberish 43/7 /good/timing/\njust onemore    8888  /thanks/mate";
String[] paths = str.replaceAll("^.*? (?=/[a-zA-Z])", "")
    .split("(?m)((?<=[a-zA-Z]/|[a-zA-Z])\\s|^).*? (?=/[a-zA-Z])");
System.out.println( Arrays.toString( paths));

Output (achieving requirements):

[/hello/world/, /good/timing/, /thanks/mate]

Upvotes: 0

boxed__l
boxed__l

Reputation: 1336

This uses split() :

String[] split = myString.split(myString.substring(0, myString.lastIndexOf(" ")));        

OR

myString.split(myString.substring(0, myString.lastIndexOf(" ")))[1]; //works for current inputs

Upvotes: 0

Ibrahim Najjar
Ibrahim Najjar

Reputation: 19423

Why split ? I think running a match here is better, try the following expression:

(?<=\s)(/[a-zA-Z/])+

Regex101 Demo

Upvotes: 3

Related Questions