Reputation: 51

Find the longest section of repeating characters

Imagine a string like this:

#*****~~~~~~**************~~~~~~~~***************************#

I am looking for an elegant way to find the indices of the longest continues section that contains a specific character. Let's assume we are searching for the * character, then I expect the method to return the start and end index of the last long section of *.

I am looking for the elegant way, I know I could just bruteforce this by checking something like

indexOf(*)
lastIndexOf(*)
//Check if in between the indices is something else if so, remember length start from new 
//substring and repeat until lastIndex reached
//Return saved indices

This is so ugly brute-force - Any more elegant way of doing this? I thought about regular expression groups and comparing their length. But how to get the indices with that?

Upvotes: 2

Answers (3)

Alexander Ivanchenko

Reputation: 29048

Regex-based solution

If you don't want to hardcode a specific character like * and find "Find the longest section of repeating characters" as the title of the question states, then the proper regular expression for the section of repeated characters would be:

"(.)\\1*"

Where (.) a group that consists from of a single character, and \\1 is a backreference that refers to that group. * is greedy quantifier, which means that presiding backreference could be repeated zero or more times.

Finally, "(.)\\1*" captures a sequence of subsequent identical characters.

Now to use it, we need to compile the regex into Pattern. This action has a cost, hence if the regex would be used multiple times it would be wise to declare a constant:

public static final Pattern REPEATED_CHARACTER_SECTION = 
    Pattern.compile("(.)\\1*");

Using features of modern Java, the longest sequence that matches the above pattern could be found literally with a single line of code.

Since Java 9 we have method Matcher.results() which return a stream of MatchResult objects, describe a matching group.

MatchResult.start() MatchResult.end() expose the way of accessing start and end indices of the group. To extract the group itself, we need to invoke MatchResult.group().

That how an implementation might look like:

public static void printLongestRepeatedSection(String str) {
    
    String longestSection = REPEATED_CHARACTER_SECTION.matcher(str).results() // Stream<MatchResult>
        .map(MatchResult::group)                                              // Stream<String>
        .max(Comparator.comparingInt(String::length))                         // find the longest string in the stream
        .orElse(""); // or orElseThrow() if you don't want to allow an empty string to be received as an input
    
    System.out.println("Longest section:\t" + longestSection);
}

main()

public static void printLongestRepeatedSection(String str) {

    MatchResult longestSection = REPEATED_CHARACTER_SECTION.matcher(str).results() // Stream<MatchResult>
        .max(Comparator.comparingInt(m -> m.group().length()))                     // find the longest string in the stream
        .orElseThrow(); // would throw an exception is an empty string was received as an input

    System.out.println("Section start:   " + longestSection.start());
    System.out.println("Section end:     " + longestSection.end());
    System.out.println("Longest section: " + longestSection.group());
}

Output:

Section start:   34
Section end:     61
Longest section: ***************************

Links:

^{Official tutorials on Lambda expressions and Stream API provided by Oracle}
^{A quick tutorial on Regular expressions}

Simple and Performant Iterative solution

You can do it without regular expressions by manually iterating over the indices of the given string and checking if the previous character matches the current one.

You just need to maintain a couple of variables denoting the start and the end of the longest previously encountered section, and a variable to store the starting index of the section that is being currently examined.

That's how it might be implemented:

public static void printLongestRepeatedSection(String str) {
    if (str.isEmpty()) throw new IllegalArgumentException();

    int maxStart = 0;
    int maxEnd = 1;
    
    int curStart = 0;

    for (int i = 1; i < str.length(); i++) {
        if (str.charAt(i) != str.charAt(i - 1)) {   // current and previous characters are not equal
            if (maxEnd - maxStart < i - curStart) { // current repeated section is longer then the maximum section discovered previously
                maxStart = curStart;
                maxEnd = i;
            }
            curStart = i;
        }
    }
    
    if (str.length() - curStart > maxEnd - maxStart) { // checking the very last section
        maxStart = curStart;
        maxEnd = str.length();
    }

    System.out.println("Section start: " + maxStart);
    System.out.println("Section end:   " + maxEnd);
    System.out.println("Section:   " + str.substring(maxStart,  maxEnd));
}

main()

public static void main(String[] args) {
    String source = "#*****~~~~~~**************~~~~~~~~***************************#";
    
    printLongestRepeatedSection(source);
}

Output:

Section start: 34
Section end:   61
Section:   ***************************

Upvotes: 2

Nowhere Man

Reputation: 19575

The solution based on the regular expression may use some features from Stream API to get an array of indexes of the longest sequence of a given character:

Pattern.quote should be used to safely wrap the input character for the search within a regular expression
Stream<MatchResult> returned by Matcher::results provides necessary information about the start and end of the match
Stream::max allows to select the longest matching sequence
Optional::map and Optional::orElseGet help convert the match into the desired array of indexes

public static int[] getIndexes(char c, String str) {
    return Pattern.compile(Pattern.quote(Character.toString(c)) + "+")
        .matcher(str)
        .results()
        .max(Comparator.comparing(mr -> mr.end() - mr.start()))
        .map(mr -> new int[]{mr.start(), mr.end()})
        .orElseGet(() -> new int[]{-1, -1});
}

// Test:
System.out.println(Arrays.toString(getIndexes('*', "#*****~~~~~~**************~~~~~~~~***************************#")));
// -> [34, 61]

Upvotes: 1

Abra

Reputation: 20913

Use methods of class Matcher.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Solution {

    public static void main(String args[]) {
        String str = "#*****~~~~~~**************~~~~~~~~***************************#";
        Pattern pattern = Pattern.compile("\\*+");
        Matcher matcher = pattern.matcher(str);
        int max = 0;
        while (matcher.find()) {
            int length = matcher.end() - matcher.start();
            if (length > max) {
                max = length;
            }
        }
        System.out.println(max);
    }
}

The regular expression searches for occurrences of one or more asterisk (*) characters.

Method end returns the index of the first character after the last character matched and method start returns the index of the first character matched. Hence the length is simply the value returned by method end minus the value returned by method start.

Each subsequent call to method find starts searching from the end of the previous match.

The only thing left is to get the longest string of asterisks.

Upvotes: 2

Find the longest section of repeating characters

Answers (3)

Regex-based solution

Simple and Performant Iterative solution

Related Questions