Anirudh
Anirudh

Reputation: 2247

Parsing text from the end (using regular expressions)

I have a seemingly simple problem though i am unable to get my head around it.

Let's say i have the following string: 'abcabcabcabc' and i want to get the last occurrence of 'ab'. Is there a way i can do this without looping through all the other 'ab's from the beginning of the string?

I read about anchoring the end of the string and then parsing the string with the required regular expression. I am unsure how to do this in Java (is it supported?).

Update: I guess i have caused a lot of confusion with my (over) simplified example. Let me try another one. Say, i have a string as thus - '12/08/2008 some_text 21/10/2008 some_more_text 15/12/2008 and_finally_some_more'. Here, i want the last date and hence i need to use regular expressions. I hope this is a better example.

Thanks, Anirudh

Upvotes: 1

Views: 802

Answers (5)

PEZ
PEZ

Reputation: 17004

This will give you the last date in group 1 of the match object.

.*(\d{2}/\d{2}/\d{4})

Upvotes: 2

Chase Seibert
Chase Seibert

Reputation: 15851

For the date example, you could do this with the Pattern API and not in the regex itself. The basic idea is to get all the matches, then return the last one.

public static void main(String[] args) {

    // this may be over-kill, you can replace with a much simpler but more lenient version
    final String dateRegex = "\\b(0?[1-9]|[12][0-9]|3[01])[- /.](0?[1-9]|1[012])[- /.](19|20)?[0-9]{2}\\b";
    final String sample = "12/08/2008 some_text 21/10/2008 some_more_text 15/12/2008 and_finally_some_more";

    List<String> allMatches = getAllMatches(dateRegex, sample);
    System.out.println(allMatches.get(allMatches.size() - 1));        
}

private static List<String> getAllMatches(final String regex, final String input) {

    final Matcher matcher = Pattern.compile(regex).matcher(input);
    return new ArrayList<String>() {{
        while (matcher.find())
            add(input.substring(matcher.start(), matcher.end()));
    }};
}

Upvotes: 0

Anirudh
Anirudh

Reputation: 2247

Firstly, thanks for all the answers.

Here is what i tried and this worked for me:

Pattern pattern = Pattern.compile("(ab)(?!.*ab)");
Matcher matcher = pattern.matcher("abcabcabcd");
if(matcher.find()) {
  System.out.println(matcher.start() + ", " + matcher.end());
}

This displays the following:

6, 8

So, to generalize - <reg_ex>(?!.*<reg_ex>) should solve this problem where '?!' signifies that the string following it should not be present after the string that precedes '?!'.

Update: This page provides a more information on 'not followed by' using regex.

Upvotes: 2

Jonas Elfstr&#246;m
Jonas Elfstr&#246;m

Reputation: 31458

I do not understand what you are trying to do. Why only the last if they are all the same? Why a regular expression and why not int pos = s.lastIndexOf(String str) ?

Upvotes: 0

chills42
chills42

Reputation: 14533

Pattern p = Pattern.compile("ab.*?$");
Matcher m = p.matcher("abcabcabcabc");
boolean b = m.matches();

Upvotes: 1

Related Questions