Sato
Sato

Reputation: 8602

fastest way to find an integer in a dataset

The interview question is more complicated, so i simplified it as

  1. input data will be in a format of A,B
  2. A is a number between 0 and 18446744073709551615(mysql bigint)
  3. B is a random string
  4. we will provide IO part

You should provide two functions in c/c++

  1. set(unsigend long long A, char *B)
  2. get(unsigend long long A)

data struct and algorithm is up to you.

Requirements

  1. set should be O(1)
  2. get should be O(1)

Put in mind that we might call set 100 million times

Any ideas? I did not give a good answer

my answer was incomplete:

typedef data {
    unsigned long long A;
    char *B;
    data *next;
}

set is just malloc a new data and append to the list

but failed in get part.

Upvotes: 2

Views: 393

Answers (2)

vadim_hr
vadim_hr

Reputation: 549

I have done this in C this way. I think you will understand my idea from the code. (Note by nos: this algorithm is known as Trie)

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct node node;

struct node {
    node *nodes[0x10];
    char *text;
};

node *top;

void set(unsigned long long A, char *B)
{
    unsigned char n;
    node *way;

    way = top;

    for (;A>0;A>>=4)
    {
        n = A & 0xf;

        if (way->nodes[n] == NULL)
        {
            way->nodes[n] = malloc(sizeof(node));
            memset(way->nodes[n], 0, sizeof(node));
        }

        way = way->nodes[n];
    }

    if (way->text != NULL)
    {
        free(way->text);
    }

    way->text = strdup(B);
}

char *get(unsigned long long A)
{
    unsigned char n;
    node *way;

    way = top;

    for (; A>0 && way != NULL; A>>=4)
    {
        n = A & 0xf;
        way = way->nodes[n];
    }

    if (A == 0 && way != NULL)
    {
        return way->text;
    }

    return NULL;
}

int main()
{
    top = malloc(sizeof(node));
    memset(top,0,sizeof(node));

    set(1230294381243, "test1");
    set(12934839, "test2");
    set(1,"tezt");

    printf("%s", get(1230294381243));
    printf("%s", get(12934839));
    printf("%s", get(1));

//    todo: free memory
//    free_way(top); 

    return 0;
}

Max 16 iterations to find any unsigned long long key. This code is 100% working and was tested, except freeing memory from the top variable.

UPDATE. Declaring nodes as an array (suggestion by HolyBlackCat).

UPDATE. Increasing algorithm speed (suggestion by Serge Rogatch)

Upvotes: 3

chux
chux

Reputation: 154255

As an interview question, it is reasonable to consider the impacts of coding goals.

Consider

  1. hash suggested by @PaulMcKenzie

  2. A Trie suggested by @Serge Rogatch and answered by @vadim_hr

  3. Huge arrays char *even[18446744073709551615u/2+1]; char *odd[18446744073709551615u/2+1];

The requirements "set should be O(1), get should be O(1)" puts aside the hash solution as it is not truly O(1). Still a hash can have excellent average speed and resource rating.

Yet there was no memory efficiency requirement, nor memory limit, nor set size (other than an implied < 100 million).

A naive implementation of #3 (Arrays) would certainly exceed real memory budgets, yet it is O(1), in theory. Still not a real candidate even though it meets the stated requirements and within memory limits (which are theoretically unbounded).

The issue with a Trie is its bottom leafs are often a wide array of NULL pointers - making this approach memory intensive. Yet if the set count (unknown) is small this is not a concern.


Put in mind that we might call set 100 million times

This point is not reflected in a trie implementation as that reminder is to also consider over-all efficiency. A trie can be very memory inefficient with a high set count and slower, on average, than a dynamic hash - even with its O(1) get/set.

This is where the interview part comes in to not just provide a technical solution that meets the requirements, certainly a trie, but to presents its advantages (O(1) for get/set) and its short-comings (memory hog), average speed slow than a hash. Thus to insure your customer (the interviewer in this case) is apprised of other reasonable solutions that may better meet the overall objectives.

Upvotes: 0

Related Questions