Johnny Pauling
Johnny Pauling

Reputation: 13397

Computational complexity of a program

I wrote the following program which should answer this question

Write an efficient function to find the first nonrepeated character in a string. For instance, the first nonrepeated character in “total” is 'o' and the first nonrepeated character in “teeter” is 'r'. Discuss the efficiency of your algorithm.

This is what I did:

#include <stdio.h>
#include <iostream>
#include <vector>
using namespace std;

class Node
{
public:
    Node::Node(char ch)
    {
        c = ch;
        next = NULL;
    }
    char c;
    Node *next;
};

Node* addNode(Node *tail, char ch)
{
    if(tail == NULL)
        return new Node(ch);
    else
    {
        Node *newN = new Node(ch);
        tail->next = newN;
        return newN;
    }
}

void deleteNode(char ch, Node** head, Node**tail)
{
    Node *prev = NULL;
    Node *cur = *head;

    while(cur!=NULL)
    {
        if(cur->c == ch)
        {
            // found cut it
            if(prev == NULL)
            {
                // head cut off
                if(*tail == *head)
                {
                    // worst possible, just one element
                    delete *head;
                    *head = NULL;
                    return;
                }
                else
                {
                    // Head cut off but not just first element
                    Node *tmp = *head;
                    *head = (*head)->next;
                    delete tmp;
                    return;
                }
            }
            else
            {
                // delete normal node
                if(*tail == cur)
                {
                    // delete tail
                    Node *tmp = *tail;
                    *tail = prev;
                    delete tmp;
                    return;
                }
                else
                {
                    // Normal node not tail
                    prev->next = cur->next;
                    delete cur;
                    return;
                }
            }
        }
        // no match keep searching
        prev = cur;
        cur = cur->next;
    }
}

int  main()
{
    char str[] = "total";
    char htable[26];

    memset(htable, 0, sizeof(char)*26);

    Node *head = NULL;
    Node *tail = head;

    for(unsigned int i=0;;i++)
    {
        if(str[i] == '\0')
            break;

        // check first match
        char m = htable[str[i]-'a'];
        switch(m)
        {
        case 0:
            {
                // first time, add it to linked list
                htable[str[i]-'a']++;
                tail = addNode(tail, str[i]);
                if(head == NULL)
                    head = tail;
            }break;
        case 1:
            {
                // bam, cut it out
                htable[str[i]-'a']++;
                deleteNode(str[i], &head, &tail);
            }break;
        }

    }

    if(head != NULL)
        printf("First char without repetition: %c", head->c);
    else
        printf("No char matched");

    return 0;
}

and it works (although I didn't free the memory at the end of the program for the linked list). Basically I keep an hashtable with a 0 if a character hasn't been found yet, a 1 if it has been found once (and it's added to the linked list at the tail position) and 2 if there are at least two occurrences of it (and should be removed by the linked list).

What's this program's computational complexity with big-O notation?

Since this algorithm just passes once per each element I think it's O(n), although the removal of the values in the linked list (in the worst case possible) would require additional O(k^2) where k is the length of the alphabet used. Something like O(n+k^2) it's my pick and if the string is very long and the alphabet restricted the algorithm becomes very efficient.

Upvotes: 0

Views: 317

Answers (2)

kfunk
kfunk

Reputation: 2082

From the algorithm description: "It has to be at least O(n) because you don't know if a character will be repeated until you've read all characters.", also see: Find the first un-repeated character in a string.

In your algorithm I don't see the O(k^2) complexity for removing the elements from the linked list. Deleting an element from a linked list is O(n) + O(1) = O(n), where O(n) is search, O(1) the erase once you've found the node.

As the linked list may only contain a maximum of k elements, the removal takes O(k). Since this is inside the for-loop => O(n*k).

Since k is constant => O(n) complexity -- however, it can implemented in a much simpler way. Again, see https://stackoverflow.com/a/2285561/592636 (using two arrays).

Upvotes: 1

paddy
paddy

Reputation: 63471

Well, yeah on the surface this looks like O(N) complexity, but you have introduced undesirable inefficiencies by using dynamic allocation.

However, because you call deleteNode and that has to search through a list, you no longer have O(N) complexity.

Think what happens if you have the string:

abcdefghijklmnopqrstuvwxyzzyxwvutsrqponmlkjihgfedcb

This has a complexity of roughly O(N*(N-1)/2) because every deleteNode call must scan to the end of the remaining list.

All you really need to do is scan the string once and store the index of each character (if not found already), then scan that array for the lowest index. If you like, you can use an index of -1 to indicate a character has not been encountered. That would be complexity of O(2N)

Upvotes: 1

Related Questions