Peter Penzov
Peter Penzov

Reputation: 1678

First pattern key is always not found

I want to read comments from .sql file and get the values:

<!--
@fake: some 
@author: some 
@ticket: ti-1232323 
@fix: some fix 
@release: master
@description: This is test example
-->

Code:

String text = String.join("", Files.readAllLines(file.toPath()));

Pattern pattern = Pattern.compile("^\\s*@(?<key>(fake|author|description|fix|ticket|release)): (?<value>.*?)$", Pattern.MULTILINE);
Matcher matcher = pattern.matcher(text);

while (matcher.find())
{
    if (matcher.group("key").equals("author")) {
        author = matcher.group("value");
    }

    if (matcher.group("key").equals("description")) {
        description = matcher.group("value");
    }    
}

The first key in this case fake is always empty. If I put author for the first key it's again empty. Do you know how I can fix the regex pattern?

Upvotes: 1

Views: 71

Answers (2)

The fourth bird
The fourth bird

Reputation: 163287

If the <!-- and --> parts should be there, you could make use of the \G anchor to get consecutive matches and keep the groups.

Note that the alternatives are already in a named capturing group (?<key> so you don't have to wrap them in another group. The part in group value can be non greedy as you are matching to the end of the string.

As @Wiktor Stribiżew mentioned, you are joining the lines back without a newline so the separate parts will not be matched using for example the anchor $ asserting the end of the string.

Pattern

(?:^<!--(?=.*(?:\R(?!-->).*)*\R-->)|\G(?!^))\R@(?<key>fake|author|description|fix|ticket|release): (?<value>.*)$

Explanation

  • (?: Non capture group
    • ^ Start of line
    • <!-- Match literally
    • (?=.*(?:\R(?!-->).*)*\R-->) Assert an ending -->
    • | Or
    • \G(?!^) Assert the end of the previous match, not at the start
  • ) Close group
  • \R@ Match a unicode newline sequence and @
  • (?<key> Named group key, match any of the alternatives
    • fake|author|description|fix|ticket|release
  • ): Match literally
  • (?<value>.*)$ Named group value Match any char except a newline until the end of the string

Regex demo | Java demo

Example code

String text = String.join("\n", Files.readAllLines(file.toPath()));
String regex = "(?:^<!--(?=.*(?:\\R(?!-->).*)*\\R-->)|\\G(?!^))\\R@(?<key>fake|author|description|fix|ticket|release): (?<value>.*)$";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(text);

while (matcher.find()) {
    if (matcher.group("key").equals("author")) {
        System.out.println(matcher.group("value"));
    }

    if (matcher.group("key").equals("description")) {
        System.out.println(matcher.group("value"));
    }
}

Output

some 
This is test example

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521194

Use the following regex pattern:

(?<!\S)@(?<key>(?:fake|author|description|fix|ticket|release)): (?<value>.*?(?![^@]))

The negative lookbehind (?<!\S) used above will match either whitespace or the start o the string, covering the initial edge case. The negative lookahead (?![^@]) at the end of the pattern will stop before the next @ term begins, or upon hitting the end of the input

String text = String.join("", Files.readAllLines(file.toPath()));
Pattern pattern = Pattern.compile("(?<!\\S)@(?<key>(?:fake|author|description|fix|ticket|release)): (?<value>.*?(?![^@]))", Pattern.DOTALL);
Matcher matcher = pattern.matcher(text);

while (matcher.find()) {
    if ("author".equals(matcher.group("key")) {
        author = matcher.group("value");
    }
    if ("description".equals(matcher.group("key")) {
        description = matcher.group("value");
    }    
}

Upvotes: 1

Related Questions