Reputation:
I have a small problem and I would like to get your opinion.
I'm dealing with documents than can reference other documents. Starting from any document, I need to get the id of all the documents this document references. The problem is that the circular references are allowed so if A ref B ref C, again C can ref A and I get in the loop. How can I solve this problem in C#?
An small example:
Let suppose that this is a class that represents a document:
public class Document
{
public Document(int id)
{
this.ID = id;
}
private int m_ID;
public int ID
{
get { return m_ID; }
set { m_ID = value; }
}
private List<Document> m_Children = new List<Document>();
public List<Document> Children
{
get { return m_Children; }
set { m_Children = value; }
}
private List<Document> m_Parent = new List<Document>();
public List<Document> Parent
{
get { return m_Parent; }
set { m_Parent = value; }
}
public Document AddChild(Document child)
{
child.Parent.Add(this);
this.Children.Add(child);
return child;
}
public Document AddChild(int child)
{
Document d = new Document(child);
return AddChild(d);
}
}
Now let's create a Document class that has some references:
public static Document CreateReferences()
{
Document d = new Document(1);
Document temp = d.AddChild(2);
for (int i = 3; i < 6; i++)
{
temp = temp.AddChild(i);
}
temp.AddChild(d);
return d;
}
Now I need to implement a method in Document class like
public List<int> GetReferencedDocuments()
{ }
What is the best way to do that? Any specific algorithm can be implemented?
Any suggestion is well accepted!
Thanks
Upvotes: 0
Views: 1198
Reputation: 10814
Example implementation:
public List<int> GetReferencedDocuments()
{
var referencedIds = new List<int>();
var queue = new Queue<Document>(this);
while (queue.Count > 0)
{
var newDocuments = queue.Dequeue().Children
.Where(d => !referencedIds.Contains(d.ID))
foreach (Document newDocument in newDocuments)
{
queue.Enqueue(newDocument);
referencedIds.Add(newDocument.ID);
}
}
return referencedIds;
}
Upvotes: 0
Reputation: 3129
There are two main approaches to resolving this sort of recursive search on recursive data: marking or recording.
Marking: every time you list a document, flag it as viewed. Do not process flagged documents.
So your GetReferenceDocuments would look a little like this:
GetReferencedDocuments(startpoint)
if(startpoint.flagged) return null
startpoint.flag
new list result =startpoint
foreach(subdocument in
documents.children)
result.append(getreferenceddocuments(subdocuments))// if not null
Recording: a similar approach, but the flag indicators are replaced by a list of already referenced documents ( a separate list of ids maybe ), and the flag check is a search on this list for this document.
Either way will work, depending on your objects, size and scale. If you cannot change the document objects, you will have to list them. If you have, potentially, 1M documents in your scan, you do not want to list them.
Upvotes: 0
Reputation: 32428
Any tree-traversal algorithm would be fine.
As well as a list of docs you're going to build up, maintain a queue of documents you've yet to check, add the first document to that list.
Then, while the queue isn't empty, get the next doc, if it's not already in your list, then add it, and add all referenced docs to your queue.
List<Document> FoundDocs = new List<Documents();
Queue<Document> DocsToSearch = new Queue<Document>();
DocsToSearch.Enqueue(StartDoc);
while(DocsToSearch.Count != 0)
{
Document Doc = DocsToSearch.Dequeue();
if(!FoundDocs.Contains(Doc))
{
FoundDocs.Add(Doc);
foreach(var ChildDoc in Doc.Children)
{
DocsToSearch.Enqueue(ChildDoc);
}
}
}
Upvotes: 4
Reputation: 133014
The best way is to do a depth first search or a breadth first search
Upvotes: 0