DevRight
DevRight

Reputation: 395

Java Regex to extract an id string, based on recurring sub-string of each id

I am reading in a log file and extracting certain data contained with in the file. I am able to extract the time for each line of the log file.

Now I want to extract the id "ieatrcxb4498-1". All of the id's start with the sub string ieatrcxb which I have tried to query and return the full string based on it.

I have tried many different suggestions from other posts. But I have been unsuccessful, with the following patterns:

(?i)\\b("ieatrcxb"(?:.+?)?)\\b
(?i)\\b\\w*"ieatrcxb"\\w*\\b"
^.*ieatrcxb.*$ 

I have also tried to extract the full id based, on the String starting with i and finishing in 1. As they all do.

Line of the log file

150: 2017-06-14 18:02:21 INFO  monitorinfo           :     Info: Lock VCS on node "ieatrcxb4498-1"

Code

Scanner s = new Scanner(new FileReader(new File("lock-unlock.txt")));
    //Record currentRecord = null;
    ArrayList<Record> list = new ArrayList<>();

    while (s.hasNextLine()) {
        String line = s.nextLine();

        Record newRec = new Record();
        // newRec.time =
        newRec.time = regexChecker("([0-1]?\\d|2[0-3]):([0-5]?\\d):([0-5]?\\d)", line);

        newRec.ID = regexChecker("^.*ieatrcxb.*$", line);

        list.add(newRec);

    }


public static String regexChecker(String regEx, String str2Check) {

    Pattern checkRegex = Pattern.compile(regEx);
    Matcher regexMatcher = checkRegex.matcher(str2Check);
    String regMat = "";
    while(regexMatcher.find()){
        if(regexMatcher.group().length() !=0)
            regMat = regexMatcher.group();
        }
        //System.out.println("Inside the "+ regexMatcher.group().trim());
    }

     return regMat;
}

I need a simple pattern which will do this for me.

Upvotes: 1

Views: 997

Answers (3)

user7605325
user7605325

Reputation:

Does the ID always have the format "ieatrcxb followed by 4 digits, followed by -, followed by 1 digit"?

If that's the case, you can do:

regexChecker("ieatrcxb\\d{4}-\\d", line);

Note the {4} quantifier, which matches exactly 4 digits (\\d). If the last digit is always 1, you could also use "ieatrcxb\\d{4}-1".

If the number of digits vary, you can use "ieatrcxb\\d+-\\d+", where + means "1 or more".

You can also use the {} quantifier with the mininum and maximum number of occurences. Example: "ieatrcxb\\d{4,6}-\\d" - {4,6} means "minimum of 4 and maximum of 6 occurrences" (that's just an example, I don't know if that's your case). This is useful if you know exactly how many digits the ID can have.

All of the above work for your case, returning ieatrcxb4498-1. Which one to use will depend on how your input varies.


If you want just the numbers without the ieatrcxb part (4498-1), you can use a lookbehind regex:

regexChecker("(?<=ieatrcxb)\\d{4,6}-\\d", line);

This makes ieatrcxb to not be part of the match, thus returning just 4498-1.

If you also don't want the -1 and just 4498, you can combine this with a lookahead:

regexChecker("(?<=ieatrcxb)\\d{4,6}(?=-\\d)", line)

This returns just 4498.

Upvotes: 1

user3734782
user3734782

Reputation: 137

You are trying to do it by very difficult way. If each line of the lock-unlock.txt file is the same like on snippet you posted, you can do following:

File logFile = new File("lock-unlock.txt");

List<String> lines = Files.readAllLines(logFile.toPath());

List<Integer> ids = lines.stream()
                .filter(line -> line.contains("ieatrcxb"))
                .map(line -> line.split( "\"")[1]) //"ieatrcxb4498-1"
                .map(line -> line.replaceAll("\\D+","")) //"44981"
                .map(Integer::parseInt) // 44981
                .collect( Collectors.toList() );

If you are not looking for just the ID number, just remove/comment second and third .map() method call, but it will result to a List of Strings instead of Integers.

Upvotes: 0

Eritrean
Eritrean

Reputation: 16498

public static void main(String[] args) {
    String line = "150: 2017-06-14 18:02:21 INFO  monitorinfo           :     Info: Lock VCS on node \"ieatrcxb4498-1\"";
    String regex ="ieatrcxb.*1";
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(line);
    while(m.find()){
        System.out.println(m.group());
    }
}

or if the id's are all quoted:

 String id = line.substring(line.indexOf("\""), line.lastIndexOf("\"")+1);
 System.out.println(id);

Upvotes: 1

Related Questions