Reputation: 13397
I wrote the following program which should answer this question
Write an efficient function to find the first nonrepeated character in a string. For instance, the first nonrepeated character in “total” is 'o' and the first nonrepeated character in “teeter” is 'r'. Discuss the efficiency of your algorithm.
This is what I did:
#include <stdio.h>
#include <iostream>
#include <vector>
using namespace std;
class Node
{
public:
Node::Node(char ch)
{
c = ch;
next = NULL;
}
char c;
Node *next;
};
Node* addNode(Node *tail, char ch)
{
if(tail == NULL)
return new Node(ch);
else
{
Node *newN = new Node(ch);
tail->next = newN;
return newN;
}
}
void deleteNode(char ch, Node** head, Node**tail)
{
Node *prev = NULL;
Node *cur = *head;
while(cur!=NULL)
{
if(cur->c == ch)
{
// found cut it
if(prev == NULL)
{
// head cut off
if(*tail == *head)
{
// worst possible, just one element
delete *head;
*head = NULL;
return;
}
else
{
// Head cut off but not just first element
Node *tmp = *head;
*head = (*head)->next;
delete tmp;
return;
}
}
else
{
// delete normal node
if(*tail == cur)
{
// delete tail
Node *tmp = *tail;
*tail = prev;
delete tmp;
return;
}
else
{
// Normal node not tail
prev->next = cur->next;
delete cur;
return;
}
}
}
// no match keep searching
prev = cur;
cur = cur->next;
}
}
int main()
{
char str[] = "total";
char htable[26];
memset(htable, 0, sizeof(char)*26);
Node *head = NULL;
Node *tail = head;
for(unsigned int i=0;;i++)
{
if(str[i] == '\0')
break;
// check first match
char m = htable[str[i]-'a'];
switch(m)
{
case 0:
{
// first time, add it to linked list
htable[str[i]-'a']++;
tail = addNode(tail, str[i]);
if(head == NULL)
head = tail;
}break;
case 1:
{
// bam, cut it out
htable[str[i]-'a']++;
deleteNode(str[i], &head, &tail);
}break;
}
}
if(head != NULL)
printf("First char without repetition: %c", head->c);
else
printf("No char matched");
return 0;
}
and it works (although I didn't free the memory at the end of the program for the linked list). Basically I keep an hashtable with a 0 if a character hasn't been found yet, a 1 if it has been found once (and it's added to the linked list at the tail position) and 2 if there are at least two occurrences of it (and should be removed by the linked list).
What's this program's computational complexity with big-O notation?
Since this algorithm just passes once per each element I think it's O(n), although the removal of the values in the linked list (in the worst case possible) would require additional O(k^2) where k is the length of the alphabet used. Something like O(n+k^2) it's my pick and if the string is very long and the alphabet restricted the algorithm becomes very efficient.
Upvotes: 0
Views: 317
Reputation: 2082
From the algorithm description: "It has to be at least O(n) because you don't know if a character will be repeated until you've read all characters.", also see: Find the first un-repeated character in a string.
In your algorithm I don't see the O(k^2) complexity for removing the elements from the linked list. Deleting an element from a linked list is O(n) + O(1) = O(n), where O(n) is search, O(1) the erase once you've found the node.
As the linked list may only contain a maximum of k elements, the removal takes O(k). Since this is inside the for-loop => O(n*k).
Since k is constant => O(n) complexity -- however, it can implemented in a much simpler way. Again, see https://stackoverflow.com/a/2285561/592636 (using two arrays).
Upvotes: 1
Reputation: 63471
Well, yeah on the surface this looks like O(N)
complexity, but you have introduced undesirable inefficiencies by using dynamic allocation.
However, because you call deleteNode
and that has to search through a list, you no longer have O(N)
complexity.
Think what happens if you have the string:
abcdefghijklmnopqrstuvwxyzzyxwvutsrqponmlkjihgfedcb
This has a complexity of roughly O(N*(N-1)/2)
because every deleteNode
call must scan to the end of the remaining list.
All you really need to do is scan the string once and store the index of each character (if not found already), then scan that array for the lowest index. If you like, you can use an index of -1
to indicate a character has not been encountered. That would be complexity of O(2N)
Upvotes: 1