Reputation: 1551

Good way to store data about u customers who visited w webpages on d days

This is interview question from Amazon.

Given u lists of customers who visited w webpages on d days, design a data structure to get customers who have visited the website on exactly k days and should have visited at least m distinct pages altogether.

Assume u, w and d are large numbers.

How do we store this information? Maybe we can use hash maps, trees, etc.

Upvotes: 3

Answers (3)

cooldude

Reputation: 11

Implement using two hash maps:

hash map<customerId,HashSet<webpages>>
hash map<customerId,HashSet<days>>

If new customer id then create new hashset insert in it else put that day or webpage in hash set for existing customer.

For each customer id see set length>=m in first hash map and in second hash map set length=k output that customer.

Upvotes: 1

Darius X.

Reputation: 2937

The most straight-forward structure would be to emulate what an RDBMS would do:

1. Have an object that contains all the basic info: (Customer, Days-Count, Distinct-Pages-Count)

2. Then, add a Tree that functions as an index to this, using (Days-Count, Distinct-Pages-Count) as its key (B-tree, or something similar)

3. You could also have another index-type structure with key of Customer. That way, you can update the information quickly

4. Finally, you would need to have more detailed information to figure out whether a page being visted was unique. This would involve storing Distinct page information under the customer (not clear from the question if this means distinct on a particular day, or across all days)

Upvotes: 1

Bernhard Barker

Reputation: 55649

Assumptions:

We have queries which give some web page j and ask for a list of users that have visited the web pages on exactly k days and have visited at least m distinct pages in total.

For the below, assume "map" = "hashmap", which would, ideally, give constant time queries. One can also use "treemap" at a logarithmic cost i.t.o the number of elements. Similarly for "set", "hashset", "treeset".

I'm assuming the input isn't static, but I won't really change my approach if it is.

Basic idea:

For each user, have a map of web pages to counts and last visited day. We can have a map of user identifiers to the map of web pages.
Whenever a user visits a page:
- If it's a new page, just create a map entry with the page to a count of 1 and the last visited day
- otherwise, if the day is the same as the last visited day, do nothing,
- otherwise, increment the count and update the last visited day.
- Total time per update = O(1)
For queries:
- Loop through all users - O(u)
- Look up the page for that user and check that the count matches - O(1)
- Check that the count of the map is >= m - O(1)
- Total time per query = O(n)

Improvements:

We can have a separate map of web page to unique ID (then obviously use the unique ID instead of the URL for the above map). It won't really change the complexity, but should improve both the space and time complexity by a constant factor.

At the cost of quite a bit of space, we can do the following (in addition to having the map from Basic idea):

For each web page, have an array of sets of user identifiers where index i in the array is a set of all users that visited that page on exactly i days.
Whenever a user visits a page:
- Do the same as in Basic idea
- Move the user from the current number of visits for that page to the new number of visits (or insert into the 1 visit set if new).
- Total time per update = O(1)
For queries:
- We can find all users that visited any given page on exactly k days in O(1).
- From here we run through this smaller list of users (say there are p of them) and check the counts of the maps of each user (in O(1)).
- Total time per query = O(p)

Feel free to point out if I missed anything.

Upvotes: 4

Good way to store data about u customers who visited w webpages on d days

Answers (3)

Related Questions