TheProgrammerNextDoor
TheProgrammerNextDoor

Reputation: 11

Use BFS to search through graph which represents Word Chain problem in Java

I need to use Breadth-first-search for a Word chain problem. I have a text file with 5 letter words, the goal is to use these words and create a graph that shows connection between words based on the first words 4 last letters if they can be found in the next word. For example climb → blimp → limps → pismo → moist. So climb goes to blimp because they both have l, i, m and b, but blimp can't go back to climb because the four last letters of climb are not in blimp.

I'm not really sure how to solve this problem and show the graph representation so that I can use BFS on it.

I've tried to implement a graph in Eclipse with Java by using Sedgewicks algorithms 4th edition, however it's quite vague as there seems to be steps missing in these example. I'm a beginner so I would appreciate basic steps or example of code with explanation based on sedgewicks algorithms on how to solve this since i've tried to solve it myself with no success.

I would like a program that shows a directed graph, with the words and edges between the words(vertex) and also be able to to BFS on it.

Upvotes: 1

Views: 780

Answers (1)

Andreas
Andreas

Reputation: 159165

First, for the sake of performance, you build a Map<String, List<String>> of normalized 4-letter combinations to list of words with those 4 letters.

E.g. climb has 5 different combinations of 4 letters that could lead to the word: limb, cimb, clmb, clib, and clim. Since letter order doesn't matter, you normalize that by sorting the letters: bilm, bcim, bclm, bcil, and cilm.

Every word is a node in your graph, and the five normalized 4-letter combinations are available inbound "ports" for edges from other words. With the map in place, you can very quickly find all the word nodes you can navigate to from a given node, by taking the last 4 letters, normalizing them, and looking up in the Map (excluding the node itself, of course).

E.g. with words climb, blimp, and limps, you get the following map:

bcil -> climb
bcim -> climb
bclm -> climb
bilm -> climb, blimp
bilp -> blimp
bimp -> blimp
blmp -> blimp
cilm -> climb
ilmp -> blimp, limps
ilms -> limps
ilps -> limps
imps -> limps
lmps -> limps

Of course, any map entry with only one word in the list is a word pointing to itself, so they are irrelevant and can be removed. Though there is no particular reason to actually do that in the code, for human consumption, the useful map entries are:

bilm -> climb, blimp
ilmp -> blimp, limps

You can now build your edges:

climb --[bilm]-> blimp
blimp --[ilmp]-> limps
limps --[imps]-> 

There is no BFS going on here.
Building the map is O(n).
Creating nodes is O(n).
Creating edges is O(n*m), where m is the average number of edges per node. Since m likely approaches a constant for larger graphs, it can be eliminated.
Which means that total solution is O(n).

Upvotes: 1

Related Questions