Reputation: 119

Most compact, efficient, readable and clever solution for the "Find the first repeated character in a string" challenge

I have a programming interview tomorrow and I'm practicing for it. I read that one of the common questions is to find the first repeated character in a string, using the prototype

size_t FindFirstRepeatedChar ( char * );

The best solution I could come up with is

#include <string.h>
#include <vector>

char FindFirstRepeatedChar ( char * S )
{
   /* Returns the index of the first repeated character in the string S. 
      Assume S does indeed contain a repeat.
   */
   std::vector<bool> booVec(128,false)
   for (char * c1(S), * c2(S+strlen(S)); c1 != c2; ++c1)
   {
       size_t i((size_t)*c1); 
       if (booVec[i] == false) booVec[i] = true;
       else return (char)i;
   }   
}

but I'm wondering

(1) Is this wrong?

(2) If it's not wrong, is there any way I can save electrons, i.e. optimize it further?

(3) Is there a standard library algorithm that I can exploit to solve it in 1 line?

Upvotes: 0

Answers (5)

Peter - Reinstate Monica

Reputation: 16026

Issues:

What others observed already: the negative chars issue is worst because it will crash. Using vector is inefficient because it allocates memory dynamically, a usually expensive operation. Iterating the string once just for finding its end can be avoided by testing for the null char on the fly.
My main concern is that you silently changed the function's return type from what was required/specified. That's pretty bad in this case because a user only reading the specification/documentation would assume that FindFirstRepeatedChar returns size_t. He could code
```
size_t firstRepIndex = S[FindFirstRepeatedChar ( char * S )];
```
which would not even lead to a warning because char has an integer promotion path to size_t. That "index" (in the range 1..255) can be out of range if S is short so that indexing S with it will be UB.

Returning a char also conveys less information than an index (where is the repetition?). Lastly, an index has values for "no repetition" (0 or the one-past the last element index), if some day you may want to test unknown strings. Chars do not have an "escape value" because in the general case a char can hold any bit pattern, including 0 (although in this case, since no length parameter is present, we must assume 0 terminated string arguments so that 0 can be used to indicate "not found").

Upvotes: 0

Paul Hankin

Reputation: 58271

I might try this:

char FindFirstRepeatedChar(char *S) {
    size_t i = 1;
    while (!memchr(S, S[i], i)) i++;
    return S[i];
}

The repeat must appear in the first 256 characters (by the pigeon-hole principle), so this code always runs in O(1) time.

Upvotes: 0

Edward Doolittle

Reputation: 4100

I'm thinking of ways to use regex matching. This works, assuming there is a duplicate. perl -pe '$_=reverse;s/(.*(.).*\2.*)/\2/s'. It would be better if I could get rid of the reverse, though.

Upvotes: 0

danielschemmel

Reputation: 11116

Your code has a few flaws.

First, char may be signed or unsigned, and casting a signed char to size_t directly may have unintended consequences due to sign extension. This is only interesting of course, if you accept that any characters greater than 127 might exist, which your code currently does not support at all - if this is not the case, you can reduce the size of your vector even further, though.

Second, you perform two iterations, one of which hits the whole string: strlen can only perform its work by running over the whole string... We can add a little more trickery here by setting the initial value of booVec[0] to true: This will trigger the return on the first occurrence, and thus allow us to completely remove the loop exit condition while improving safety.

Third, your code is not unicode aware, although this might be by design.

By replacing char*s with char const*s, we allow the code to work on e.g. string literals as well.

Finally, you can replace your usage of vector with an array. This will slightly increase runtime memory usage, but will probably increase performance significantly (due to not requiring a heap allocation and no bit-manipulation to extract bits from the bitvector).

char FindFirstRepeatedChar(char const* S)
{
    bool booVec[(std::numeric_limits<char>::max)()] = { true };
    for (char const* c(S); ; ++c)
    {
        size_t i(static_cast<unsigned char>(*c)); 
        if (booVec[i] == false) booVec[i] = true;
        else return *c;
    }
}

Finally, for the standard library: Since standard library algorithms require an end iterator, they will be less performant without additional trickery (e.g. writing an iterator type type with interesting semantics) which requires more code than your whole example.

Upvotes: 0

meneldal

Reputation: 1737

Usually interviews aren't expecting you to say that this library already does it since they want to check if you can code and write an algorithm.

As to improve it, I would say in an interview they're probably not asking for the best optimization but more for something easy to read (which would also mean less risk of mistakes when you're writing it).

I would probably write a very simple nested loop within the first one. It is slower but very easy to write. I would move to your solution later if they ask me to make something faster.

The main bug you can get in that program is that you input a char (usually 8 bits) and your vector stores only 128 elements so it will crash if there is a non ASCII character in the string.

Upvotes: 2

Most compact, efficient, readable and clever solution for the &quot;Find the first repeated character in a string&quot; challenge

Answers (5)

Related Questions

Most compact, efficient, readable and clever solution for the "Find the first repeated character in a string" challenge