EngBIRD
EngBIRD

Reputation: 2025

Compound regex for two possible patterns

I am trying to do some regex pattern matching in java in order to try and import values from a structure file with two distinct patterns.

I have a file that may look like this:

[Group Variable]
name = Value

[Valid Extensions]
images = {
jpeg
png
}

This file is a config file for a java program. I am using a modified version of the java code here: What is the easiest way to parse an INI file in Java?

This code lets me make specific requests for a variable name like name. (Therefor no need to save anything to the left of the equal sign.

The first pattern is simple, "Grab any content on the line after the equals sign". The regex for that is pretty simple: (\s*([^=]*)=(.*))

The second is a little more complicated "grab all content after the equals sign between the curly braces (i.e. to enclose elements of an array spread out across multiple rows)"

I have tried to find the text between two curly braces using a modification of (?<=\\{)(.*?)(?=\\})

I have tried to setup an if statement to ignore a line containing open curly brace like ([^\{]|^)* https://stackoverflow.com/a/1264575/4383447. From my reading regex will support if then else logic (?(?=regex)then|else) so

I haven't been able to get the regix for this or the combination of the two working. And it's preferred that I use a complicated regex expression capable of handling both cases rather than use iteration or recursion on the java side.

Interestingly some of my attempts seem to fail on the java side, and others while possible that they would have worked did not appear to work as tested by: https://regex101.com/r/aG1xO0/2 . A few of the attempts I still had recorded when I decided to post it as a question are below. I no longer have my efforts on if and or logic alternatives.

(\s*([^=]*)=\{)(.*?)(?=\})
(\s*([^=]*)=(?<=\{)(.*?)(?=\}))
\s*([^=]*)=(?(?=([^{]|^)(.*))(.*)|{([^}]*)})
\s*([^=]*)=(.*))|(\s*([^={*}]*)=\{)(.*?)(?=\})

Upvotes: 2

Views: 1612

Answers (3)

Pshemo
Pshemo

Reputation: 124225

Based on your description you may be looking for something like

Pattern p = Pattern.compile("=\\s*(\\{[^}]*\\}|.*)");
Matcher m = p.matcher(data);
while(m.find()){
    System.out.println(m.group(1));
    System.out.println("------");
}

DEMO

Explanation.

We are looking for some part which exists after = and optionally whitespaces. But we don't need that part so we can either

  1. use look-behind (?<=...)

or

  1. wrap needed part in capturing group.

Option 1 is impossible here because look-behind must have obvious maximum length which \s* (zero or more optional whitespaces) prevents. Which means we are left with option 2.
Now need to describe two cases which we are interested in. To do so we will use case1|case2 and we will put it in capturing group. To avoid situation where matching case1 will prevent matching case2 we need to write most specific case at start. Here it is regex representing area {.\n.\n.} because regex matching only one line {. could prevent us from matching rest of \n.\n.} part.

Now {...} can be represented as \\{[^}](\\}. [^}] means any non-} character which means we will be also able to match line separators. So it has advantage over .*? because we don't need to bother with making regex see . as all characters including line separators with Pattern.DOTALL flag. We also don't need to use reluctant quantifier *? which reduces performance a little because of backtracking.

Avoiding Pattern.DOTALL also has this advantage that we can write regex representing second case (rest of line after =) simply as .* because . will not be able to match line separators.


If you want to also include property name you could use ^([^=\n\r]+?)\s*=\s*(\{([^}]*)\}|.*) regex with MULTILINE flag (allowing ^ to represent start of each line, not only start of entire text).

DEMO 2

Upvotes: 2

Darshan Mehta
Darshan Mehta

Reputation: 30819

As not all the lines contain curly braces, I would recommend using two steps to split the String (so that you can still continue processing the original String if match for curly braces is not found).

Step 1 would be to extract Strings with your regex, and once we get the String, we can use the following to extract the content between curly braces:

String string = "fdwfs{aaaa}fsfds";
Pattern pattern = Pattern.compile("\\{(.*?)\\}");
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
    System.out.println(matcher.group(1));
}

It won't go into while if match is not found. In that case, we can process the whole String.

Upvotes: 0

Pravin Umamaheswaran
Pravin Umamaheswaran

Reputation: 704

\{([\w\n]*)\}

This extracts jpeg and png from the structure.

Upvotes: 0

Related Questions