Kumar
Kumar

Reputation: 21

Getting substring from a string based on underscores with regular expression in java

I am working on a middleware tool in which we have an predefined option of using java regular expressions with subStringRegEx( regex , string).

My requirement is to get the required substring between the underscores(_) from given filename( ex: ABC_XYZ_123_adbc1234-ed98_1234.dat).

I have tried below 3 ways and all are working when tested with online tools by selecting java. Whereas not working as expected in my tool, I am getting “ABC_XYZ_123_ adbc1234-ed98” instead of only “adbc1234-ed98” value.

  1. (?:[^_]+)_(?:[^_]+)_(?:[^_]+)_([^_]+)
  2. .*?_.*?_.*?_([^_]+)
  3. ^[^_]*_[^_]*_[^_]*_([^_]*)_

Request your suggestions to achieve the solution.

Thanks, Kumar

Upvotes: 2

Views: 130

Answers (5)

The fourth bird
The fourth bird

Reputation: 163342

Just for completeness, all 3 patterns work but you have to get the value from group 1.

Example

String patterns[] = { 
    "(?:[^_]+)_(?:[^_]+)_(?:[^_]+)_([^_]+)", 
    ".*?_.*?_.*?_([^_]+)",
    "^[^_]*_[^_]*_[^_]*_([^_]*)_"
};

String s = "ABC_XYZ_123_adbc1234-ed98_1234.dat";

for (String p : patterns) {
    
    Pattern pattern = Pattern.compile(p);
    Matcher matcher = pattern.matcher(s);
    
    if (matcher.find()) {
        System.out.println(matcher.group(1));
    }
}

Output

adbc1234-ed98
adbc1234-ed98
adbc1234-ed98

See a Java demo.

Upvotes: 1

user4910279
user4910279

Reputation:

I'm not sure about the spec for subStringRegEx (regex, string), but if it returns a substring ($0) in string that matches regex, then it should be

String regex = "[^_]+(?=_[^_]*$)";

Upvotes: 0

charmful0x
charmful0x

Reputation: 153

You can simply use the String methods to achieve this:

const str = "ABC_XYZ_123_adbc1234-ed98_1234.dat"

const charSet = str.substr(0, str.length-4).split("_").join("")
console.log(charSet)

Upvotes: 0

RavinderSingh13
RavinderSingh13

Reputation: 133518

With your shown samples, please try following regex. Value is coming in capture group 1, so do replace with $1 while performing substitution.

^(?:.*?_){3}([^_]*)_.*\.dat$

Online Demo for above regex

OR in case format of files could be anything(apart from .dat) then try following.

^(?:.*?_){3}([^_]*)_.*

Online demo for above regex

Explanation: Adding detailed explanation for above regex.

^(?:.*?_){3}  ##Matching from starting of value, using non greedy match till _ 3 times in a non capturing group.
([^_]*)       ##Creating 1st capturing group which has values till 1st Occurrence of _ in it.
_.*\.dat$     ##Matching from _ to till dat at the end of value.

Upvotes: 4

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626816

You can use

^(?:[^_]+_){3}([^_]+).*

and replace with $1. See the regex demo.

Details:

  • ^ - start of string
  • (?:[^_]+_){3} - three occurrences of any one or more chars other than _ and then a _ char
  • ([^_]+) - Group 1 (referred to with $1 from the replacement pattern): one or more chars other than _
  • .* - the rest of the string.

Another idea:

^.*_([^_]+)_[0-9]+\.[^._]*$

See this regex demo, and you will still need to replace with $1.

Details:

  • ^ - start of string
  • .* - any text (not including line break chars, as many as possible)
  • _ - a _ char
  • ([^_]+) - one or more chars other than _
  • _ - a _ char
  • [0-9]+ - one or more digits
  • \. - a . char (NOTE: \ might need doubling)
  • [^._]* - any zero or more chars other than . and _
  • $ - end of string.

Upvotes: 1

Related Questions