Nikolas Leontiou
Nikolas Leontiou

Reputation: 25

Creating a LinkedList containing unique wordnodes of a text source

I want to create a LinkedList of nodes containing unique words of a text and for each node create another list containing the words that follow them in the text.
I have come up with this code, but it is not correct. It seems that method contains always produces false. Any suggestions?

public void train(String sourceText){
    String[] words=sourceText.split("\\s+");
    List<String> textList=Arrays.asList(words);
    starter=textList.get(0);
    System.out.println(starter);
    ListNode prevWord= new ListNode(starter);
    wordList = new LinkedList<ListNode>();
    wordList.add(prevWord);
    for (int i=1;i<textList.size();i++){
        if (wordList.contains(prevWord)){
            prevWord.addNextWord(textList.get(i));
        }
        else{
            wordList.add(prevWord);
            prevWord.addNextWord(textList.get(i));
            }
        prevWord=new ListNode(textList.get(i));
    }
}

class ListNode
{
    private String word;

    private List<String> nextWords;

    ListNode(String word)
    {
        this.word = word;
        nextWords = new LinkedList<String>();
    }

    public String getWord()
    {
        return word;
    }

    public void addNextWord(String nextWord)
    {
        nextWords.add(nextWord);
    }
}

Upvotes: 1

Views: 203

Answers (2)

Nikolas Leontiou
Nikolas Leontiou

Reputation: 25

This is what finally works.Thank you all for your help and especially Luk.Your code has a null pointer exception due to wordlist being empty when entered the for node loop.

public void train(String sourceText){
        String[] words=sourceText.split("\\s+");
        //List<String> textList=Arrays.asList(words);
        starter=words[0];
        ListNode prevWord= new ListNode(starter);
        wordList = new LinkedList<>();
        wordList.add(prevWord);
        outerLoop: for (int i = 0; i < words.length; i++) {
            for (ListNode node : wordList) {
                if (node.getWord().equals(words[i])) {
                    if (i != words.length - 1) {
                        node.addNextWord(words[i+1]);
                    }
                    continue outerLoop;
                }
            }

        ListNode node = new ListNode(words[i]);
        wordList.add(node);
        if (i != words.length - 1) {
            node.addNextWord(words[i+1]);
        }            
    }

    System.out.println(wordList.size());
}

Upvotes: 1

luk2302
luk2302

Reputation: 57124

As you correctly found out wordList.contains(prevWord) always returns false - that is because java checks for object equality meaning that the objects are the same - not look the same, but actually are the exact same object.

But beyond that I think you were on a good track - I still changed your code quite a bit but you might able to see some aspects that remained from your attempt:

public void train(String sourceText){
    String[] words = sourceText.split("\\s+");

    LinkedList<ListNode> wordList = new LinkedList<>();
    outerLoop: for (int i = 0; i < words.length; i++) {
        for (ListNode node : wordList) {
            if (node.getWord().equals(words[i])) {
                if (i != words.length - 1) {
                    node.addNextWord(words[i+1]);
                }
                continue outerLoop;
            }
        }

        ListNode node = new ListNode(words[i]);
        wordList.add(node);
        if (i != words.length - 1) {
            node.addNextWord(words[i+1]);
        }            
    }

    System.out.println(wordList.size());
}

Let me explain what this code does:

  • it splits the words (as you did)
  • it creates an empty list of ListNodes
  • it loops over the words
    • it looks for any already existing ListNode in the list that has the the same word
    • if one is found the word after the current word is added to the ListNodes's nextWords list
    • if no matching node is found the code reaches the second part of the loop where I create a new node
  • that node is added to the list of nodes and maybe gets a next word added

That code is not perfect, has duplication and uses a labels and continues, but you can get rid of them pretty easily.

Upvotes: 1

Related Questions