Reputation: 11
I want to find the amount of lines the appropiate regex matches. The input is a log file which in inserted via Java Stream. I want to apply multiple filter on this stream but count each seppretly.
Stream<String> lines = Files.lines(path);
// regex transformation to predicate for filter method
String[] regs = {".*/e_miete_1\\.html.*", ".*/fa-portal/(.*\\.html|api/.*).*"};
ArrayList<Predicate<String>> compRegs = new ArrayList<>();
for(String reg : regs) {
compRegs.add(Pattern.compile(reg).asPredicate());
}
// usage of predicate
eMiete = lines
.filter(compRegs.get(0))
.count();
clicks = lines
.filter(compRegs.get(1))
.count();
System.out.println(eMiete);
System.out.println(clicks);
Upvotes: 1
Views: 846
Reputation: 110
If you want to solve the problem of only going over the stream once, instead of filtering the same log twice, you can create an intermediate data stricture which holds the counts up to this point and reduce that.
As an illustration with int arrays as the data structure:
Pattern[] regs = {
Pattern.compile (".*/e_miete_1\\.html.*"),
Pattern.compile (".*/fa-portal/(.*\\.html|api/.*).*")
};
int [] sums = lines
.map (line -> {
int[] matches = new int[regs.length];
for ( int i = 0; i < matches.length; i++ ) {
matches[i] = regs[i].matcher (line).matches () ? 1 : 0;
}
return matches;
})
.reduce ((l, r) -> {
int [] sum = new int [l.length];
for ( int i = 0; i < sum.length; i++ ) {
sum[i] = l[i] + r[i];
}
return sum;
})
.orElseThrow ();
System.out.println ("meite: " + sums[0]);
System.out.println ("clicks: " + sums[1]);
At every line it checks which patterns are matched, and during the reduction we accumulate all the counts.
The specific implementation above has some drawbacks. It creates a new array for every step in the reduction, it also relies heavily on matching index, ... But these are things you can clean up in the actual implementation.
Upvotes: 0
Reputation: 499
If you have only 2 predicates, use Yassin's solution with Collectors.teeing(). For the case of various number of predicates, you can use:
String[] regs = {.....};
ArrayList<Predicate<String>> compRegs = new ArrayList<>();
for(String reg : regs) {
compRegs.add(Pattern.compile(reg).asPredicate());
}
int[] countPerPredicate = lines.collect(
()->new int[compRegs.size()], // supplier
(int[] arr, String line)->{ // accumulator
for (int i=0; i<arr.length; i++) {
if (compRegs.get(i).test(line)) {
arr[i]++;
}
}
},
(int[] arr1, int[] arr2) -> { // combiner
for (int i=0; i<arr1.length; i++) {
arr1[i] += arr2[i];
}
}
);
//System.out.println(Arrays.toString(countPerPredicate ));
Upvotes: 2
Reputation: 21995
To avoid going twice through your Stream
, you could use Collectors#teeing
together with Collectors#filtering
and Collectors#counting
Stream<String> lines = Files.lines(path);
String[] regs = {".*/e_miete_1\\.html.*", ".*/fa-portal/(.*\\.html|api/.*).*"};
Predicate<String> eMietsPredicate = Pattern.compile(regs[0]).asPredicate();
Predicate<String> clicksPredicate = Pattern.compile(regs[1]).asPredicate();
long[] result = lines.stream()
.collect(Collectors.teeing(
Collectors.filtering(
eMietsPredicate, Collectors.counting()
),
Collectors.filtering(
clicksPredicate, Collectors.counting()
),
(eMiete, clicks) -> new long[]{ eMiete, clicks }
);
If you're using java-16 already, you can go further and use a locally defined record
Stream<String> lines = Files.lines(path);
String[] regs = {".*/e_miete_1\\.html.*", ".*/fa-portal/(.*\\.html|api/.*).*"};
Predicate<String> eMietsPredicate = Pattern.compile(regs[0]).asPredicate();
Predicate<String> clicksPredicate = Pattern.compile(regs[1]).asPredicate();
record Result(long eMiete, long clicks) {}
Result result = lines.stream()
.collect(Collectors.teeing(
Collectors.filtering(
eMietsPredicate, Collectors.counting()
),
Collectors.filtering(
clicksPredicate, Collectors.counting()
),
Result::new
);
System.out.println(result); // Result[eMiete=15, clicks=35]
Upvotes: 2