Grynets
Grynets

Reputation: 2525

Best algorithm to process hash data

Question is about best way to handle data.
Let's assume we have such key -> value data:

"[email protected]": { "name": "John",
                    "age": 20,
                    "job": "developer",
                    "favourite_food": ['taco', 'steak']
                    //...etc
                  }
 //...etc

There is a lot of data for users with key "email", like a million. And usually I had to search users by their email.
But today my boss came up to me and said he want to search users by their names and of course keep possibility to search by email. On the other day he said he want my program to realize search by age and so on.
My first thought was to iterate over data with, for example, this php code:

 foreach($email as $data){
   foreach($data as $k => $v){
     if($v == 'search value'){
       return $email;
     }
   }
 }

But this solution is not good for big amount of data.
My second idea was to iterate over first data and create for each email own table to make it look like this:

$a = "[email protected]": {//all data}
$b = "John" : {//all data including email}
$c = "developer":{//all other data}
// and so on

But my users getting older with time, so I have to update user age every time the data in my main object changes.
So, my question is, what is the best way to implement such logic using any programming language?


Some notes:
It had to be done by using programming language without touching MySQL or any other DB.

Upvotes: 0

Views: 65

Answers (1)

Huang
Huang

Reputation: 609

  1. I think using the year of birth of users instead of age might be better in this situation.

  2. You can use index if you are using database. If not, I think you can create index by yourself. A simple index strategy is:

Do not change the original data, but add index dicts where the keys are index and values are email. Like in python you can add two indices, name and yearofbirth:

name = {"John": ["[email protected]", "[email protected]", "[email protected]"],
        "Mike": ["[email protected]", ...],
        #...etc}
yearofbirth = {"1981":["[email protected]", "[email protected]"],
               #...etc}

In this way, you can search by name or yearofbirth to get the email and then fetch the original data. And it is fast.

Upvotes: 1

Related Questions