Removing duplicates with Hash Set for a Linked List, I can't understand the work flow of code

The problem is to remove duplicates from an unsorted Linked List. Using a hash set keeps track of the values already in the set.

I do not understand what the if, else block is doing. Especially the code, prev = current.

Does that overwrite the prev from before?

I drew out multiple diagrams on a whiteboard and can't understand the flow of this code

public void removeDuplicateFromUnsortedList() {
    HashSet<Integer> hs = new HashSet<Integer>();

    Node current = head;
    Node prev = null;

    while(current != null) {

        int current_data = current.data;

        if(hs.contains(current_data)) {
            prev.next = current.next;
        }
        else {
            hs.add(current_data);
            prev = current;
        }

        current = current.next;

    }
}

This is the desired result, which works.

Printing out: 1 --> 1 --> 4 --> 2 --> 4 --> 2 --> 2 --> null

Printing out: 1 --> 4 --> 2 --> null

Upvotes: 1

Answers (1)

katbutt

Reputation: 11

I'm going to try to trace this out for you! (I know this is 4+ years old but whatever)

Basically, if the HashSet does not contain the current data, then we aren't removing it from the LL, and we then set prev to be the current node, and curr to be the next node. We're shifting both forward.

If the HashSet does contain the current data, we shift curr forward, but prev stays where it is and prev.next is set to skip over the current node as to remove it.

Our LL:

"index":  0 (head)  1       2       3       4       5       6       7
data:     1         1       4       2       4       2       2       null

First part:

Node curr = head;    // node at index 0
Node prev = null;

Iterations

Iteration 1: curr == head

"index":  0 (head)  1       2       3       4       5       6       7
data:     1         1       4       2       4       2       2       null
          curr

curr_data = curr.data;     // curr_data = 1
if (hs.contains(curr_data)) {/* hs is empty */}
else {
    hs.add(curr_data);
    prev = curr;           // prev = head
}
curr = curr.next;          // curr = node at index 1

// hs = {1}

Iteration 2: curr == node at index 1

"index":  0 (head)  1       2       3       4       5       6       7
data:     1         1       4       2       4       2       2       null
          prev      curr


curr_data = curr.data;     // curr_data = 1
if (hs.contains(curr_data)) {. // hs = {1}
    // now we set the previous node's .next to be the current node's .next,
    // thus removing the current node from the LL
    prev.next = curr.next;
}
curr = curr.next;          // curr = node at index 2

// hs = {1}

After this 2nd iteration, where curr's data was a duplicate, the previous node's .next is changed to be curr's .next, bypassing curr and removing it from the list. prev is not changed in this case, because curr will now be the node at index 2, but the node at index 1 was removed, so prev stays where it is.

Iteration 3: curr == node at index 2

"index":  0 (head)  1       2       3       4       5       6       7
data:     1         1       4       2       4       2       2       null
          prev              curr


curr_data = curr.data;     // curr_data = 4
if (hs.contains(curr_data)) {/* hs = {1} */}
else {
    hs.add(curr_data);
    prev = curr;           // prev = node at index 2
}
curr = curr.next;          // curr = node at index 3

// hs = {1, 4}

Iteration 4: curr == node at index 3

"index":  0 (head)  1       2       3       4       5       6       7
data:     1         1       4       2       4       2       2       null
                            prev    curr


curr_data = curr.data;     // curr_data = 2
if (hs.contains(curr_data)) {/* hs = {1, 4} */}
else {
    hs.add(curr_data);
    prev = curr;           // prev = node at index 3
}
curr = curr.next;          // curr = node at index 4

// hs = {1, 4, 2}

Iteration 5: curr == node at index 4

"index":  0 (head)  1       2       3       4       5       6       7
data:     1         1       4       2       4       2       2       null
                                    prev    curr


curr_data = curr.data;     // curr_data = 4
if (hs.contains(curr_data)) {  // hs = {1, 4, 2}
    // again, now we set the previous node's .next to be the current node's .next,
    // thus removing the current node from the LL
    prev.next = curr.next;
}
curr = curr.next;          // curr = node at index 4

// hs = {1, 4, 2}

Iteration 6: curr == node at index 5

"index":  0 (head)  1       2       3       4       5       6       7
data:     1         1       4       2       4       2       2       null
                    (rem)           prev    (rem)   curr


curr_data = curr.data;     // curr_data = 2
if (hs.contains(curr_data)) {  // hs = {1, 4, 2}
    // again, now we set the previous node's .next to be the current node's .next,
    // thus removing the current node from the LL
    prev.next = curr.next;
}
curr = curr.next;          // curr = node at index 6

// hs = {1, 4, 2}

Iteration 7: curr == node at index 6

"index":  0 (head)  1       2       3       4       5       6       7
data:     1         1       4       2       4       2       2       null
                    (rem)                   (rem)   prev    curr


curr_data = curr.data;     // curr_data = 2
if (hs.contains(curr_data)) {  // hs = {1, 4, 2}
    // again, now we set the previous node's .next to be the current node's .next,
    // thus removing the current node from the LL
    prev.next = curr.next;
}
curr = curr.next;          // curr = node at index 6

// hs = {1, 4, 2}

Upvotes: 0

Removing duplicates with Hash Set for a Linked List, I can&#39;t understand the work flow of code

Answers (1)

Related Questions

Removing duplicates with Hash Set for a Linked List, I can't understand the work flow of code