Reputation: 40385
I'm reading Cracking the Coding Interview, Fourth Edition: 150 Programming Interview Questions and Solutions and I'm trying to solve the following question:
2.1 Write code to remove duplicates from an unsorted linked list. FOLLOW UP: How would you solve this problem if a temporary buffer is not allowed?
I'm solving it in C#, so I made my own Node
class:
public class Node<T> where T : class
{
public Node<T> Next { get; set; }
public T Value { get; set; }
public Node(T value)
{
Next = null;
Value = value;
}
}
My solution is to iterate through the list, then for each node to iterated through the remainder of the list and remove any duplicates (note that I haven't actually compiled or tested this, as instructed by the book):
public void RemoveDuplicates(Node<T> head)
{
// Iterate through the list
Node<T> iter = head;
while(iter != null)
{
// Iterate to the remaining nodes in the list
Node<T> current = iter;
while(current!= null && current.Next != null)
{
if(iter.Value == current.Next.Value)
{
current.Next = current.Next.Next;
}
current = current.Next;
}
iter = iter.Next;
}
}
Here is the solution from the book (the author wrote it in java):
Without a buffer, we can iterate with two pointers: “current” does a normal iteration, while “runner” iterates through all prior nodes to check for dups. Runner will only see one dup per node, because if there were multiple duplicates they would have been removed already.
public static void deleteDups2(LinkedListNode head)
{
if (head == null) return;
LinkedListNode previous = head;
LinkedListNode current = previous.next;
while (current != null)
{
LinkedListNode runner = head;
while (runner != current) { // Check for earlier dups
if (runner.data == current.data)
{
LinkedListNode tmp = current.next; // remove current
previous.next = tmp;
current = tmp; // update current to next node
break; // all other dups have already been removed
}
runner = runner.next;
}
if (runner == current) { // current not updated - update now
previous = current;
current = current.next;
}
}
}
So my solution always looks for duplicates for the current node to the end, while their solution looks for duplicates from the head to the current node. I feel like both solutions would suffer performance issues depending on how many duplicates there are in the list and how they're distributed (density and position). But in general: is my answer nearly as good as the one in the book or is it significantly worse?
Upvotes: 8
Views: 18914
Reputation: 21
Hacker Rank Day24:More Linked Lists,Removing duplicate Node in C#.
static Node RemoveDuplicateNode(Node head)
{
Node Link = head;
Node Previous;
Node DulicateNode;
int count = 0,temp;
while (Link != null)
{
temp = Link.data;
DulicateNode = Link;
Previous = Link;
while(DulicateNode != null)
{
if(DulicateNode.data==temp)
{
Previous.data = DulicateNode.data;
Previous.next = DulicateNode.next;
++count;
}
if(count>=2)
{
if(DulicateNode.next != null)
{
DulicateNode.data = DulicateNode.next.data;
DulicateNode.next = DulicateNode.next.next;
}
else
DulicateNode=null;
}
else
DulicateNode = DulicateNode.next;
}
count = 0;
Link = Link.next;
}
return head;
}
Upvotes: 0
Reputation: 210
Here's the implementation using HashSet in O(n)
time.
I have used a hashset to store unique values and 2 node-pointers to traverse through the linkedlist. If a duplicate is found, assign the value of current pointer to the previous pointer.
This will ensure removal of duplicate records.
/// <summary>
/// Write code to remove duplicates from an unsorted linked list.
/// </summary>
class RemoveDups<T>
{
private class Node
{
public Node Next;
public T Data;
public Node(T value)
{
this.Data = value;
}
}
private Node head = null;
public static void MainMethod()
{
RemoveDups<int> rd = new RemoveDups<int>();
rd.AddNode(15);
rd.AddNode(10);
rd.AddNode(15);
rd.AddNode(10);
rd.AddNode(10);
rd.AddNode(20);
rd.AddNode(30);
rd.AddNode(20);
rd.AddNode(30);
rd.AddNode(35);
rd.PrintNodes();
rd.RemoveDuplicates();
Console.WriteLine("Duplicates Removed!");
rd.PrintNodes();
}
private void RemoveDuplicates()
{
//use a hashtable to remove duplicates
HashSet<T> hs = new HashSet<T>();
Node current = head;
Node prev = null;
//loop through the linked list
while (current != null)
{
if (hs.Contains(current.Data))
{
//remove the duplicate record
prev.Next = current.Next;
}
else
{
//insert element into hashset
hs.Add(current.Data);
prev = current;
}
current = current.Next;
}
}
/// <summary>
/// Add Node at the beginning
/// </summary>
/// <param name="val"></param>
public void AddNode(T val)
{
Node newNode = new Node(val);
newNode.Data = val;
newNode.Next = head;
head = newNode;
}
/// <summary>
/// Print nodes
/// </summary>
public void PrintNodes()
{
Node current = head;
while (current != null)
{
Console.WriteLine(current.Data);
current = current.Next;
}
}
}
Upvotes: 0
Reputation: 383
C# Code for removing duplicates left after the first set of iteration:
public Node removeDuplicates(Node head)
{
if (head == null)
return head;
var current = head;
while (current != null)
{
if (current.next != null && current.data == current.next.data)
{
current.next = current.next.next;
}
else { current = current.next; }
}
return head;
}
Upvotes: 0
Reputation: 11
Here's the answer in C
void removeduplicates(N **r)
{
N *temp1=*r;
N *temp2=NULL;
N *temp3=NULL;
while(temp1->next!=NULL)
{
temp2=temp1;
while(temp2!=NULL)
{
temp3=temp2;
temp2=temp2->next;
if(temp2==NULL)
{
break;
}
if((temp2->data)==(temp1->data))
{
temp3->next=temp2->next;
free(temp2);
temp2=temp3;
printf("\na dup deleted");
}
}
temp1=temp1->next;
}
}
Upvotes: 0
Reputation: 11
Code in C:
void removeduplicates(N **r)
{
N *temp1=*r;
N *temp2=NULL;
N *temp3=NULL;
while(temp1->next!=NULL)
{
temp2=temp1;
while(temp2!=NULL)
{
temp3=temp2;
temp2=temp2->next;
if(temp2==NULL)
{
break;
}
if((temp2->data)==(temp1->data))
{
temp3->next=temp2->next;
free(temp2);
temp2=temp3;
printf("\na dup deleted");
}
}
temp1=temp1->next;
}
}
Upvotes: 0
Reputation: 39
Tried the same in cpp. Please let me know your comments on this.
// ConsoleApplication2.cpp : Defines the entry point for the console application. //
#include "stdafx.h"
#include <stdlib.h>
struct node
{
int data;
struct node *next;
};
struct node *head = (node*)malloc(sizeof(node));
struct node *tail = (node*)malloc(sizeof(node));
struct node* createNode(int data)
{
struct node *newNode = (node*)malloc(sizeof(node));
newNode->data = data;
newNode->next = NULL;
head = newNode;
return newNode;
}
bool insertAfter(node * list, int data)
{
//case 1 - insert after head
struct node *newNode = (node*)malloc(sizeof(node));
if (!list)
{
newNode->data = data;
newNode->next = head;
head = newNode;
return true;
}
struct node * curpos = (node *)malloc(sizeof(node));
curpos = head;
//case 2- middle, tail of list
while (curpos)
{
if (curpos == list)
{
newNode->data = data;
if (curpos->next == NULL)
{
newNode->next = NULL;
tail = newNode;
}
else
{
newNode->next = curpos->next;
}
curpos->next = newNode;
return true;
}
curpos = curpos->next;
}
}
void deleteNode(node *runner, node * curr){
//DELETE AT TAIL
if (runner->next->next == NULL)
{
runner->next = NULL;
}
else//delete at middle
{
runner = runner->next->next;
curr->next = runner;
}
}
void removedups(node * list)
{
struct node * curr = (node*)malloc(sizeof(node));
struct node * runner = (node*)malloc(sizeof(node));
curr = head;
runner = curr;
while (curr != NULL){
runner = curr;
while (runner->next != NULL){
if (curr->data == runner->next->data){
deleteNode(runner, curr);
}
if (runner->next!=NULL)
runner = runner->next;
}
curr = curr->next;
}
}
int _tmain(int argc, _TCHAR* argv[])
{
struct node * list = (node*) malloc(sizeof(node));
list = createNode(1);
insertAfter(list,2);
insertAfter(list, 2);
insertAfter(list, 3);
removedups(list);
return 0;
}
Upvotes: 0
Reputation: 368
Code in java:
public static void dedup(Node head) {
Node cur = null;
HashSet encountered = new HashSet();
while (head != null) {
encountered.add(head.data);
cur = head;
while (cur.next != null) {
if (encountered.contains(cur.next.data)) {
cur.next = cur.next.next;
} else {
break;
}
}
head = cur.next;
}
}
Upvotes: 0
Reputation:
Heapsort is an in-place sort. You could modify the "siftUp" or "siftDown" function to simply remove the element if it encounters a parent that is equal. This would be O(n log n)
function siftUp(a, start, end) is
input: start represents the limit of how far up the heap to sift.
end is the node to sift up.
child := end
while child > start
parent := floor((child - 1) ÷ 2)
if a[parent] < a[child] then (out of max-heap order)
swap(a[parent], a[child])
child := parent (repeat to continue sifting up the parent now)
else if a[parent] == a[child] then
remove a[parent]
else
return
Upvotes: 1
Reputation: 21752
There's not much of a difference. If I've done my math right your's is on average N/16 slower than the authors but pleanty of cases exist where your implementation will be faster.
Edit:
I'll call your implementation Y and the author's A
Both proposed solutions has O(N^2) as worst case and they both have a best case of O(N) when all elements are the same value.
EDIT: This is a complete rewrite. Inspired by the debat in the comments I tried to find the average case for random N random numbers. That is a sequence with a random size and a random distribution. What would the average case be.
Y will always run U times where U is the number of unique numbers. For each iteration it will do N-X comparisons where X is the number of elements removed prior to the iteration (+1). The first time no element will have been removed and on average on the second iteration N/U will have been removed.
That is on average ½N will been left to iterate. We can express the average cost as U*½N. The average U can be expressed based on N as well 0
Expressing A becomes more difficult. Let's say we use I iterations before we've encountered all unique values. After that will run between 1 and U comparisons (on average that's U/") and will do that N-I times.
I*c+U/2(N-I)
but whats the average number of comparisons (c) we run for the first I iterations. on average we need to compare against half of the elements already visited and on average we've visited I/2 elements, Ie. c=I/4
I/4+U/2(N-I).
I can be expressed in terms of N. On average we'll need to visited half on N to find the unique values so I=N/2 yielding an average of
(I^2)/4+U/2(N-I) which can be reduced to (3*N^2)/16.
That is of course if my estimation of the averages are correct. That is on average for any potential sequence A has N/16 fewer comparisons than Y but pleanty of cases exists where Y is faster than A. So I'd say they are equal when compared to the number of comparisons
Upvotes: 4
Reputation: 17599
How about using a HashMap? This way it will take O(n) time and O(n) space. I will write psuedocode.
function removeDup(LinkedList list){
HashMap map = new HashMap();
for(i=0; i<list.length;i++)
if list.get(i) not in map
map.add(list.get(i))
else
list.remove(i)
end
end
end
Of course we assume that HashMap has O(1) read and write.
Another solution is to use a mergesort and removes duplicate from start to end of the list. This takes O(n log n)
mergesort is O(n log n) removing duplicate from a sorted list is O(n). do you know why? therefore the entire operation takes O(n log n)
Upvotes: 3
Reputation: 59131
If you give a person a fish, they eat for a day. If you teach a person to fish...
My measures for the quality of an implementation are:
As for your implementation:
Upvotes: 9
Reputation: 26904
Your approach is simply specular to the book's! You go forward, the book goes backward. There is no difference as both of you scan all elements. And, yes, since no buffer is allowed, there are performance issues. You usually don't have to mind about performance with such costrained questions and when it's not explicitly required.
Interview questions are made to test your open mindness. I have doubts about Mark's answer: it definitely is the best solution in real-world examples, but even if these algorithms use constant space, the constraint that no temporary buffer is allowed must be respected.
Otherwise, I guess that the book would have adopted such an approach. Mark, please forgive me for being critic against you.
Anyway, just to go deeper in the matter, yours and the book's approach both require Theta(n^2)
time, while Mark's approach requires Theta(n logn) + Theta(n)
time, which results in Theta(n logn)
. Why Theta
? Because compare-swap algorithms are Omega(n logn)
too, remember!
Upvotes: 0
Reputation: 68016
Your solution is just as good as author's, only it has a bug in implementation :) Try tracing it on a list of two nodes with equal data.
Upvotes: 0