Reputation: 4220
File file = new File("file-type-string-i-want-2000-01-01-01-01-01.conf.gz");
Matcher matcher = pattern.compile("\\-(.*)\\-\\d{4}")).matcher(fileName);
StringBuilder sb = new StringBuilder();
while (matcher.find()) {
sb.append(matcher.group());
}
stringList = Arrays.asList(sb.toString().split("-"));
if (stringList.size() >= 2) {
nameFragment = stringList.get(stringList.size() - 2);
}
Desired result is to extract
string-iwant
from strings that look like this
file-type-string-iwant-2000-01-01-01-01-01.conf.gz
Unfortunatly, the format for "string-iwant" is a non-fixed length of alpha-numeric characters that will include only ONE hyphen BUT never start with a hyphen. The date formatting is consistent, the year is always after the string, so my current approach is to match on the -year, but I'm having difficulty excluding the stuff at the beginning.
Thanks for any thoughts or ideas
Edit: updated strings
Upvotes: 4
Views: 515
Reputation: 785108
The regex that I would use for this purpose is this with a positive lookahead:
Pattern p = Pattern.compile("[^-]+-[^-]+(?=-\\d{4})");
Which simply means match the text containing exactly one hyphen followed by one hyphen and a 4 digit year.
Then you can simply grab the matcher.group(0)
as your matched text which will be string-iwant
in this case.
Upvotes: 0
Reputation: 7470
If it were PHP I would use something like the following to capture that string.
/^(\w+\-){2}(?<string>.+?)\-\d{4}(\-\d{2}){5}(\.\w+){2}$/
Upvotes: 0
Reputation: 26132
Here's the regex you need:
\\-([^-]+\\-[^-]+)\\-\\d{4}\\-
Basically it means:
-
starts with minus([^-]+\\-[^-]+)
contains 1 or more non-minus symbols, then a minus, then 1 or more non-minus symbols. This part is captured.-\d{4}
a minus sign and 4 digitsHowever, that will only work if stuff-you-need
has only one hyphen (or a constant amount of hyphens, which will need correction in regex). Otherwise, there is no way to know if given the string file-type-string-i-want
the word type
belongs to a sting you want or not.
Added:
In case the file-type
always contains exactly one hyphen, you can capture the required part this way:
[^-]+\\-[^-]+\\-(.*)\\-\\d{4}\\-
Explanation:
[^-]+\-[^-]+\\-
some amount of non-hyphen characters, then a hyphen, then more non-hyphens. This will skip the file-type
string with the following hyphen.\-\d{4}\-
a hyphen, 4 digits followed by another hyphen(.*)
everything in between of previous 2 statements is captured as being the string you need to selectUpvotes: 4