DKG
DKG

Reputation: 387

Regular expression to extract text in reverse order until 3rd instance of a character

I have a string in the format XXXX_YYYY_YYYYYYY_YYYYYYZZZZ

How can I extract the string from backwards, until the thrid _ (underscore) is hit. extracted value: YYYY_YYYYYYY_YYYYYYZZZZ

I tried this ((?:_[^_]*){3})$ and it seem to work with extra _ in the beginning which I can probably remove it in Java.

Is there any way I get get with out the _ in the beginning.

Upvotes: 6

Views: 761

Answers (5)

amekki
amekki

Reputation: 126

For a performance issue, it is better to use regex rather than String split. The answer of Jan is what you need.

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522797

If you reverse the string first, then you can get away with a very simple regex of (.*)(_.*):

String input = "XXXX_YYYY_YYYYYYY_YYYYYYZZZZ";
input = new StringBuilder(input).reverse().toString().replaceAll("(.*)(_.*)", "$1");
input = new StringBuilder(input).reverse().toString();
System.out.println(input);

Output:

YYYY_YYYYYYY_YYYYYYZZZZ

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627600

A non-regex approach is also possible:

String s = "XXXX_YYYY_YYYYYYY_YYYYYYZZZZ";
List r = Arrays.asList(s.split("_"));       // Split by _ and get a List
r = r.subList(Math.max(r.size() - 3, 0), r.size()); // Grab last 3 elements
System.out.println(String.join("_", r));    // Join them with _
// => YYYY_YYYYYYY_YYYYYYZZZZ

See IDEONE demo

In case there are less than 3 elements after splitting, just the remaining ones will get joined (i.e. XX_YYY will turn into XX_YYY).

Upvotes: 1

Jan
Jan

Reputation: 13858

Like this:

        String line = "XXXX_YYYY_YYYYYYY_YYYYYYZZZZ";

        Pattern p = Pattern.compile("([^_]+(?:_[^_]*){2})$");
        Matcher m = p.matcher(line);
        if(m.find()) {
            System.out.println(m.group(1));
        }

Simply split your "three-times" {3} into one instance without _ and two that need it.

Upvotes: 3

sp00m
sp00m

Reputation: 48837

This one should suit your needs:

[^_]+(?:_[^_]+){2}$

Regular expression visualization

Debuggex Demo

Upvotes: 5

Related Questions