shantanuraje
shantanuraje

Reputation: 1

Find longest word in dictionary that is a subsequence of a given string (Google TechDevGuide)

This a Google interview questions that is now a problem in Google's Foundations of Programming course. I was trying to understand the solution to the following problem:

https://techdevguide.withgoogle.com/paths/foundational/find-longest-word-in-dictionary-that-subsequence-of-given-string#code-challenge

"Given a string S and a set of words D, find the longest word in D that is a subsequence of S. Word W is a subsequence of S if some number of characters, possibly zero, can be deleted from S to form W, without reordering the remaining characters. Note: D can appear in any format (list, hash table, prefix tree, etc. For example, given the input of S = "abppplee" and D = {"able", "ale", "apple", "bale", "kangaroo"} the correct output would be "apple" "

So the main doubt I have is in the section the specifies An Optimal method O(N+L) for small alphabet. In the previous approach i.e O(N + L log N), the author builds a hash table mapping characters -> sorted indexes in String

So for "abppple", the hash table is:

a -> [0]
b -> [1]
p -> [2,3,4]
l -> [5]
e -> [6]

In the optimal approach for small alphabet, it is mentioned using a dense vector representation instead of the above sparse vector representation.

p -> [2, 2, 3, 4, -1, -1, -1, -1]

What are they trying to represent? The string S = "abppplee" does have 8 characters but I don't get what the dense vector is representing? Is it an error or am I missing something?

Thank you.

Upvotes: 0

Views: 954

Answers (1)

ewcz
ewcz

Reputation: 13087

As far as I understand it, this vector should address the question (with Y set to character p)

Given my last matched character was at index X and my next character to match is Y, where does this match occur?

In other words, if you are at index i in the original string, the element of the vector p[i] tells you where is the closest next character p in the string.

To be more specific, the original string is s=abppplee and we want to verify:

p -> [2, 2, 3, 4, -1, -1, -1, -1]

This means that if we are at index 0, the closest p is at index 2. For index 1, the closest p is also at 2. If we are at index 2, the closest next p is at position 3, etc. The values -1 signify that there is no p in the rest of the string (for example at position 4, the remaining characters are indeed just lee, thus there is no p).

Upvotes: 1

Related Questions