RandomUser
RandomUser

Reputation: 4220

Java regex String parse, trying to figure out a pattern

File file = new File("file-type-string-i-want-2000-01-01-01-01-01.conf.gz");
            Matcher matcher = pattern.compile("\\-(.*)\\-\\d{4}")).matcher(fileName);
            StringBuilder sb = new StringBuilder();
            while (matcher.find()) {
                sb.append(matcher.group());
            }
            stringList = Arrays.asList(sb.toString().split("-"));
            if (stringList.size() >= 2) {
                nameFragment = stringList.get(stringList.size() - 2);
            }

Desired result is to extract

string-iwant 

from strings that look like this

file-type-string-iwant-2000-01-01-01-01-01.conf.gz 

Unfortunatly, the format for "string-iwant" is a non-fixed length of alpha-numeric characters that will include only ONE hyphen BUT never start with a hyphen. The date formatting is consistent, the year is always after the string, so my current approach is to match on the -year, but I'm having difficulty excluding the stuff at the beginning.

Thanks for any thoughts or ideas

Edit: updated strings

Upvotes: 4

Views: 515

Answers (3)

anubhava
anubhava

Reputation: 785108

The regex that I would use for this purpose is this with a positive lookahead:

Pattern p = Pattern.compile("[^-]+-[^-]+(?=-\\d{4})");

Which simply means match the text containing exactly one hyphen followed by one hyphen and a 4 digit year.

Then you can simply grab the matcher.group(0) as your matched text which will be string-iwant in this case.

Upvotes: 0

inhan
inhan

Reputation: 7470

If it were PHP I would use something like the following to capture that string.

/^(\w+\-){2}(?<string>.+?)\-\d{4}(\-\d{2}){5}(\.\w+){2}$/

Upvotes: 0

bezmax
bezmax

Reputation: 26132

Here's the regex you need:

\\-([^-]+\\-[^-]+)\\-\\d{4}\\-

Basically it means:

  • - starts with minus
  • ([^-]+\\-[^-]+) contains 1 or more non-minus symbols, then a minus, then 1 or more non-minus symbols. This part is captured.
  • -\d{4} a minus sign and 4 digits

However, that will only work if stuff-you-need has only one hyphen (or a constant amount of hyphens, which will need correction in regex). Otherwise, there is no way to know if given the string file-type-string-i-want the word type belongs to a sting you want or not.

Added:

In case the file-type always contains exactly one hyphen, you can capture the required part this way:

[^-]+\\-[^-]+\\-(.*)\\-\\d{4}\\-

Explanation:

  • [^-]+\-[^-]+\\- some amount of non-hyphen characters, then a hyphen, then more non-hyphens. This will skip the file-type string with the following hyphen.
  • \-\d{4}\- a hyphen, 4 digits followed by another hyphen
  • (.*) everything in between of previous 2 statements is captured as being the string you need to select

Upvotes: 4

Related Questions