D. Sergeev
D. Sergeev

Reputation: 2355

regex for parsing java enum

everyone!

I want to get enum constants using one regex

public*[ ]enum*[ ]DynamicVocabularyName[ ]*{([\ \n\r\t]*([A-Z\_]*)[ ]*\,)*[\ \n\r\t]*

input:

package ..;

public enum DynamicVocabularyName {
    ACTIVE_CRIMES_COUNT,
    ACTIVE_CRIME_TYPE,
    APPLICATION_NAME,
    ARREST_COUNT,
    AREA_DESCRIPTION,
    AREA_NAME,
    BAD_PARAMETER_NAME,
    BATTERY_PERCENT,
    CAROWNER_PERSON_ADDRESS,
    CAROWNER_PERSON_FULLNAME,
    CAROWNER_PERSON_PHONE,
    CRIME_DATE_DESCRIPTION,
    CRIME_SUBTYPE,
}

desired output - array of enum values:

ACTIVE_CRIMES_COUNT
ACTIVE_CRIME_TYPE
APPLICATION_NAME
ARREST_COUNT
AREA_DESCRIPTION
...

But I get only last enum value.

How to fix my regex to get all values?

Link to to this example: https://regex101.com/r/rX7pJ1/3

Upvotes: 1

Views: 3820

Answers (4)

SubOptimal
SubOptimal

Reputation: 22963

If you maybe have to deal with multiple different Java source files it might be worth to have a look at JavaParser.

For your current problem a solution might look like

// mvn artefact: com.google.code.javaparser:javaparser
//import japa.parser.JavaParser;
//import japa.parser.ParseException;
//import japa.parser.ast.CompilationUnit;
//import japa.parser.ast.body.EnumConstantDeclaration;
//import japa.parser.ast.body.EnumDeclaration;
//import japa.parser.ast.body.TypeDeclaration;

// mvn artefact: com.github.javaparser:javaparser-core
import com.github.javaparser.JavaParser;
import com.github.javaparser.ParseException;
import com.github.javaparser.ast.CompilationUnit;
import com.github.javaparser.ast.body.EnumConstantDeclaration;
import com.github.javaparser.ast.body.EnumDeclaration;
import com.github.javaparser.ast.body.TypeDeclaration;

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.util.List;

public class EnumNames {

    public static void main(String[] args) throws ParseException, IOException {
        String code = "public enum DynamicVocabularyName {\n"
                + "    ACTIVE_CRIMES_COUNT,\n"
                + "    ACTIVE_CRIME_TYPE,\n"
                + "    APPLICATION_NAME,\n"
                + "    ARREST_COUNT,\n"
                + "    AREA_DESCRIPTION,\n"
                + "    AREA_NAME,\n"
                + "    BAD_PARAMETER_NAME,\n"
                + "    BATTERY_PERCENT,\n"
                + "    CAROWNER_PERSON_ADDRESS,\n"
                + "    CAROWNER_PERSON_FULLNAME,\n"
                + "    CAROWNER_PERSON_PHONE,\n"
                + "    CRIME_DATE_DESCRIPTION,\n"
                + "    CRIME_SUBTYPE,\n"
                + "}";

        ByteArrayInputStream in = new ByteArrayInputStream(code.getBytes());
        CompilationUnit cu = JavaParser.parse(in);
        List<TypeDeclaration> types = cu.getTypes();
        for (TypeDeclaration type : types) {
            if (type instanceof EnumDeclaration) {
                List<EnumConstantDeclaration> enumConstants = 
                        ((EnumDeclaration) type).getEntries();
                for (EnumConstantDeclaration enumConstant : enumConstants) {
                System.out.println("enum constant: " + enumConstant.getName());
                }
            }
        }
    }
}

output

enum constant: ACTIVE_CRIMES_COUNT
enum constant: ACTIVE_CRIME_TYPE
enum constant: APPLICATION_NAME
enum constant: ARREST_COUNT
enum constant: AREA_DESCRIPTION
enum constant: AREA_NAME
enum constant: BAD_PARAMETER_NAME
enum constant: BATTERY_PERCENT
enum constant: CAROWNER_PERSON_ADDRESS
enum constant: CAROWNER_PERSON_FULLNAME
enum constant: CAROWNER_PERSON_PHONE
enum constant: CRIME_DATE_DESCRIPTION
enum constant: CRIME_SUBTYPE

Upvotes: 1

fluminis
fluminis

Reputation: 4079

  • public* will match publi, public, publicccc, ... you should replace by (public)? => public zero or one time

  • same for enum*

  • { is special char in regex, so you need to escape it \{

With no guaranties, seems enough:

(\s*([A-Z][A-Z_]*)\s*[,\}]\s*)

Upvotes: 2

anubhava
anubhava

Reputation: 785481

To capture all enum constants in your example code you can use this regex based on \G that asserts position at the end of the previous match.

(?:\spublic\s+enum\s+DynamicVocabularyName\s*\{|\G,)\s+([A-Z_]+)(?=[^{}]*\})

RegEx Demo

Upvotes: 1

Federico Piazza
Federico Piazza

Reputation: 31035

Your question is not clear enough and you haven't provided some code, so if I understood your question correctly, then you can use a regex like this:

\b([A-Z_]+)\b

Working demo

Upvotes: 3

Related Questions