Reputation: 499
Suppose you are given a string of length n
and there are 26 ASCII characters that can constitute the string. Identical characters are grouped together in the string (e.g aaabbbcccddd). Find the most frequent character.
The brute force is to scan the entire string and iteratively count the number of characters which takes O(n) time with O(1) space. Is there a O(log n) solution for this? I was given a hint to binary search to find the partition that contains the most frequent character, but I am not sure how binary search would help me eliminating the half of the string.
Upvotes: 1
Views: 660
Reputation: 11947
You can use binary search to find the last occurrence (or rightmost occurrence) of a character in a sub-range of a string to solve this problem.
You use it to hop across the input string, from character to character, no matter how often the same character is being repeated. While you hop, you remember the most frequent character so far and how often it was repeated.
middle( left : Integer, right : integer) : (mid : Integer, valid : Boolean)
find-rightmost(s : String, c : Char, left : Integer, right : Integer) : (lastIndex : Integer, found : Boolean)
find-rightmost
is classic binary search in recursive form.
most-frequent-char(s : String) : (c : Char, frequency : Integer)?
This yields a total complexity of about O(m*log(n)) where m is the number of disjoint characters.
Since pseudo code cannot easily be verified to actually "work", here the same in Common Lisp:
(defparameter *test-cases*
'(""
"a"
"ab"
"aabcd"
"aaaaaaaaaa"
"abbbbbbbcccdd"))
(defun middle (left right)
(let ((mid (+ left (floor (- right left) 2))))
(values mid (/= mid left))))
(defun find-rightmost (s c left right)
(if (< left right)
(multiple-value-bind (mid valid) (middle left right)
(cond
((char-equal c (aref s right))
(values right t))
((and valid (char-equal c (aref s mid)))
(find-rightmost s c mid right))
(valid
(find-rightmost s c left mid))
(t (values left (char-equal c (aref s left))))))
(values left nil)))
(defun most-frequent-char (s)
(let* ((n (length s))
(n-1 (- n 1)))
(cond
((= n 0)
nil)
((= n 1)
(list :char (aref s 0) :freq 1))
(t ;; string length > 1 - we actually have to work...
(loop
with i = 0
with cmaxf = nil
with nmax = 0
while (< i n-1)
for c = (aref s i)
do (multiple-value-bind (ilast found)
(find-rightmost s c (+ i 1) n-1)
(if found
(let ((freq (+ 1 (- ilast i))))
(when (> freq nmax)
(setf nmax freq)
(setf cmaxf c))
(setf i (+ ilast 1)))
(progn
(when (= 0 nmax)
(setf nmax 1)
(setf cmaxf c))
(incf i))))
finally (return (list :char cmaxf :freq nmax)))))))
(defun run-tests ()
(mapcar #'(lambda (in)
(list in (most-frequent-char in)))
*test-cases*))
CL-USER> (format t "~{~S~%~}~%" (run-tests))
("" NIL)
("a" (:CHAR #\a :FREQ 1))
("ab" (:CHAR #\a :FREQ 1))
("aabcd" (:CHAR #\a :FREQ 2))
("aaaaaaaaaa" (:CHAR #\a :FREQ 10))
("abbbbbbbcccdd" (:CHAR #\b :FREQ 7))
Upvotes: 2
Reputation: 51903
As your characters are grouped the problem is reduced to finding longest sequence of consequent letters...
So yes you can use binary search to find the size of each sequence which is O(m.log(n))
where m<=n
is the number of sequences (distinct letters). As you stated m<=26
you can consider it as constant leading to O(log(n))
The algo would be something like this:
consider your string is char s[n];
and result int l=0;
for (i=0;i<n;)
binary search j
position
of last character the where s[i]==s[j]
where i <= j < n
remember max l=max(l , j+1-i)
i=j+1
continue
the loop from #2
However note that any letter can have only single sequence so strings like abaaaaaa
will not work.
Upvotes: 2