GixXboy
GixXboy

Reputation: 11

Java Multiple Regex filter on one Stream

I want to find the amount of lines the appropiate regex matches. The input is a log file which in inserted via Java Stream. I want to apply multiple filter on this stream but count each seppretly.

Stream<String> lines = Files.lines(path);

            // regex transformation to predicate for filter method
            String[] regs = {".*/e_miete_1\\.html.*", ".*/fa-portal/(.*\\.html|api/.*).*"};
            ArrayList<Predicate<String>> compRegs = new ArrayList<>();

            for(String reg : regs) {
                compRegs.add(Pattern.compile(reg).asPredicate());
            }

            // usage of predicate
            
            eMiete = lines
                    .filter(compRegs.get(0))
                    .count();

            clicks = lines
                    .filter(compRegs.get(1))
                    .count();
        System.out.println(eMiete);
        System.out.println(clicks);

Upvotes: 1

Views: 846

Answers (3)

groot sarchy
groot sarchy

Reputation: 110

If you want to solve the problem of only going over the stream once, instead of filtering the same log twice, you can create an intermediate data stricture which holds the counts up to this point and reduce that.

As an illustration with int arrays as the data structure:

Pattern[] regs = {
    Pattern.compile (".*/e_miete_1\\.html.*"),
    Pattern.compile (".*/fa-portal/(.*\\.html|api/.*).*")
};

int [] sums = lines
    .map (line -> {
        int[] matches = new int[regs.length];
        for ( int i = 0; i < matches.length; i++ ) {
            matches[i] = regs[i].matcher (line).matches () ? 1 : 0;
        }
        return matches;
    })
    .reduce ((l, r) -> {
        int [] sum = new int [l.length];
        for ( int i = 0; i < sum.length; i++ ) {
            sum[i] = l[i] + r[i];
        }
        return sum;
    })
    .orElseThrow ();

System.out.println ("meite: " + sums[0]);
System.out.println ("clicks: " + sums[1]);

At every line it checks which patterns are matched, and during the reduction we accumulate all the counts.

The specific implementation above has some drawbacks. It creates a new array for every step in the reduction, it also relies heavily on matching index, ... But these are things you can clean up in the actual implementation.

Upvotes: 0

Daniel Zin
Daniel Zin

Reputation: 499

If you have only 2 predicates, use Yassin's solution with Collectors.teeing(). For the case of various number of predicates, you can use:

    String[] regs = {.....};
    ArrayList<Predicate<String>> compRegs = new ArrayList<>();

    for(String reg : regs) {
        compRegs.add(Pattern.compile(reg).asPredicate());
    }

    int[] countPerPredicate = lines.collect(
            ()->new int[compRegs.size()],               // supplier 
          
            (int[] arr, String line)->{                 // accumulator
                for (int i=0; i<arr.length; i++) {
                    if (compRegs.get(i).test(line)) {
                        arr[i]++;
                    }
                }
            },
            (int[] arr1, int[] arr2) -> {               // combiner
                for (int i=0; i<arr1.length; i++) {
                    arr1[i] += arr2[i];
                }
            }
        );

    //System.out.println(Arrays.toString(countPerPredicate ));

Upvotes: 2

Yassin Hajaj
Yassin Hajaj

Reputation: 21995

To avoid going twice through your Stream, you could use Collectors#teeing together with Collectors#filtering and Collectors#counting

Stream<String> lines = Files.lines(path);
String[] regs = {".*/e_miete_1\\.html.*", ".*/fa-portal/(.*\\.html|api/.*).*"};
Predicate<String> eMietsPredicate = Pattern.compile(regs[0]).asPredicate();
Predicate<String> clicksPredicate = Pattern.compile(regs[1]).asPredicate();

long[] result = lines.stream()
                    .collect(Collectors.teeing(
                        Collectors.filtering(
                            eMietsPredicate, Collectors.counting()
                        ),
                        Collectors.filtering(
                            clicksPredicate, Collectors.counting()
                        ),
                        (eMiete, clicks) -> new long[]{ eMiete, clicks }
                    );

If you're using already, you can go further and use a locally defined record

Stream<String> lines = Files.lines(path);
String[] regs = {".*/e_miete_1\\.html.*", ".*/fa-portal/(.*\\.html|api/.*).*"};
Predicate<String> eMietsPredicate = Pattern.compile(regs[0]).asPredicate();
Predicate<String> clicksPredicate = Pattern.compile(regs[1]).asPredicate();

record Result(long eMiete, long clicks) {}

Result result = lines.stream()
                    .collect(Collectors.teeing(
                        Collectors.filtering(
                            eMietsPredicate, Collectors.counting()
                        ),
                        Collectors.filtering(
                            clicksPredicate, Collectors.counting()
                        ),
                        Result::new
                    );

System.out.println(result); // Result[eMiete=15, clicks=35]

Upvotes: 2

Related Questions