Reputation: 395
I am reading in a log file and extracting certain data contained with in the file. I am able to extract the time for each line of the log file.
Now I want to extract the id "ieatrcxb4498-1"
. All of the id's start with the sub string ieatrcxb
which I have tried to query and return the full string based on it.
I have tried many different suggestions from other posts. But I have been unsuccessful, with the following patterns:
(?i)\\b("ieatrcxb"(?:.+?)?)\\b
(?i)\\b\\w*"ieatrcxb"\\w*\\b"
^.*ieatrcxb.*$
I have also tried to extract the full id based, on the String starting with i
and finishing in 1
. As they all do.
Line of the log file
150: 2017-06-14 18:02:21 INFO monitorinfo : Info: Lock VCS on node "ieatrcxb4498-1"
Code
Scanner s = new Scanner(new FileReader(new File("lock-unlock.txt")));
//Record currentRecord = null;
ArrayList<Record> list = new ArrayList<>();
while (s.hasNextLine()) {
String line = s.nextLine();
Record newRec = new Record();
// newRec.time =
newRec.time = regexChecker("([0-1]?\\d|2[0-3]):([0-5]?\\d):([0-5]?\\d)", line);
newRec.ID = regexChecker("^.*ieatrcxb.*$", line);
list.add(newRec);
}
public static String regexChecker(String regEx, String str2Check) {
Pattern checkRegex = Pattern.compile(regEx);
Matcher regexMatcher = checkRegex.matcher(str2Check);
String regMat = "";
while(regexMatcher.find()){
if(regexMatcher.group().length() !=0)
regMat = regexMatcher.group();
}
//System.out.println("Inside the "+ regexMatcher.group().trim());
}
return regMat;
}
I need a simple pattern which will do this for me.
Upvotes: 1
Views: 997
Reputation:
Does the ID always have the format "ieatrcxb
followed by 4 digits, followed by -
, followed by 1 digit"?
If that's the case, you can do:
regexChecker("ieatrcxb\\d{4}-\\d", line);
Note the {4}
quantifier, which matches exactly 4 digits (\\d
). If the last digit is always 1
, you could also use "ieatrcxb\\d{4}-1"
.
If the number of digits vary, you can use "ieatrcxb\\d+-\\d+"
, where +
means "1 or more".
You can also use the {}
quantifier with the mininum and maximum number of occurences. Example: "ieatrcxb\\d{4,6}-\\d"
- {4,6}
means "minimum of 4 and maximum of 6 occurrences" (that's just an example, I don't know if that's your case). This is useful if you know exactly how many digits the ID can have.
All of the above work for your case, returning ieatrcxb4498-1
. Which one to use will depend on how your input varies.
If you want just the numbers without the ieatrcxb
part (4498-1
), you can use a lookbehind regex:
regexChecker("(?<=ieatrcxb)\\d{4,6}-\\d", line);
This makes ieatrcxb
to not be part of the match, thus returning just 4498-1
.
If you also don't want the -1
and just 4498
, you can combine this with a lookahead:
regexChecker("(?<=ieatrcxb)\\d{4,6}(?=-\\d)", line)
This returns just 4498
.
Upvotes: 1
Reputation: 137
You are trying to do it by very difficult way. If each line of the lock-unlock.txt
file is the same like on snippet you posted, you can do following:
File logFile = new File("lock-unlock.txt");
List<String> lines = Files.readAllLines(logFile.toPath());
List<Integer> ids = lines.stream()
.filter(line -> line.contains("ieatrcxb"))
.map(line -> line.split( "\"")[1]) //"ieatrcxb4498-1"
.map(line -> line.replaceAll("\\D+","")) //"44981"
.map(Integer::parseInt) // 44981
.collect( Collectors.toList() );
If you are not looking for just the ID number, just remove/comment second and third .map()
method call, but it will result to a List of Strings instead of Integers.
Upvotes: 0
Reputation: 16498
public static void main(String[] args) {
String line = "150: 2017-06-14 18:02:21 INFO monitorinfo : Info: Lock VCS on node \"ieatrcxb4498-1\"";
String regex ="ieatrcxb.*1";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(line);
while(m.find()){
System.out.println(m.group());
}
}
or if the id's are all quoted:
String id = line.substring(line.indexOf("\""), line.lastIndexOf("\"")+1);
System.out.println(id);
Upvotes: 1