EpicKip
EpicKip

Reputation: 4043

Find nesting and sort

Situation:
I have 'pages' that have links to other pages, these pages will be stored in the database so to store 1 I would need to store its linked pages first (for the foreign key).
So what I need to do is find all the pages linked from the selected page up until the root.

Example:
A - links to - B & C & D
B - links to - C & F
C - links to - E & F
D - links to - B & C
E - links to - F
F - links to - No links

The correct nesting order in this example would be:
F - E - C - B - D - A

Note usually I have closer to 30 pages with links all over the place

Problem:
I already have code for this which works, but getting the links from each page takes a while (average of 800ms) so I would like to check the links for the pages as little as possible.

Code example: (not production code)

static class Program
{
    public static Dictionary<int, int[]> Dict = new Dictionary<int, int[]>();
    private static int hitcount = 0;
    static void Main(string[] args)
    {
        //Example data
        Stopwatch sw = new Stopwatch();
        sw.Start();
        Dict.Add( 1, new int[]{2, 3, 4});
        Dict.Add(2, new int[] {3, 6 });
        Dict.Add(3, new int[] { 5, 6 });
        Dict.Add(4, new int[] { 2, 3 });
        Dict.Add(5, new int[] { 6 });
        Dict.Add(6, new int[] {});

        var links = GetAllLinks( 1 );
        foreach ( var link in links )
        {
            Console.WriteLine(link.ToString());
        }
        sw.Stop();
        Console.WriteLine("MS:" + sw.ElapsedMilliseconds + " - " + hitcount);
        Console.ReadKey();
    }


    private static List<int> GetLinksFromKey(int key)
    {
        //This usually takes avg 800ms so sleep here
        //This is the BottleNeck, the more often its called longer it will take
        Thread.Sleep( 800 );
        hitcount++;
        return Dict[key].ToList();
    }

    private static List<int> GetAllLinks(int key)
    {
        var allPages = new List<int>();
        var pages = new List<int>();
        pages.Add(key);
        while (true)
        {
            var i = 0;
            var newP = new List<int>();
            newP.AddRange(pages);
            foreach (var page in pages.Distinct())
            {
                if (allPages.Contains(page))
                {
                    continue;
                }
                newP.AddRange(GetLinksFromKey(page));
                i++;
                allPages.Add(page);
            }
            pages = new List<int>(newP);
            if (i == 0)
            {
                break;
            }
        }
        return SortLinks(new List<int>(allPages.Distinct()));
    }

    private static List<int> SortLinks(List<int> pagesToSort)
    {
        var sortedPages = new List<int>();
        var hasReference = new List<int>();
        while (sortedPages.Count != pagesToSort.Count)
        {
            foreach (var page in pagesToSort)
            {
                if (sortedPages.Contains(page))
                {
                    continue;
                }
                var links = GetLinksFromKey(page);
                if (new List<int>(links.Distinct()).RemoveListFromList(sortedPages).Count == 0)
                {
                    sortedPages.Add(page);
                }
                else
                {
                    hasReference.Add(page);
                }
                if (hasReference.Distinct().Count() == pagesToSort.Distinct().Count())
                {
                    Console.WriteLine("There are circular references, can't find the root.");
                    return sortedPages;
                }
            }
        }
        return sortedPages;
    }
    private static List<int> RemoveListFromList(this List<int> mainList, List<int> removeList)
    {
        foreach (var item in removeList)
        {
            if (mainList.Contains(item))
            {
                mainList.Remove(item);
            }
        }
        return mainList;
    }
}

I made this code as an example for my nesting situation, I made it so it works with a dictionary instead of a page. Using the dictionary for this is very fast but I know in which method the bottleneck is so if anyone has a solution where I use it less that would be great.

Question:
Is there anyway to make this more efficient? As I feel I'm doing it wrong.
If there isn't a faster way to do this, I would also appreciate to hear.

If you can make hitcount < 20 and have the same output as me (6 - 5 - 3 - 2 - 4 - 1) you have already improved it

Upvotes: 2

Views: 77

Answers (1)

xanatos
xanatos

Reputation: 111890

With recursion it should be:

public static Dictionary<int, int[]> Dict = new Dictionary<int, int[]>();
private static int hitcount = 0;

static void Main(string[] args)
{
    //Example data
    Stopwatch sw = new Stopwatch();
    sw.Start();

    Dict.Add(1, new int[] { 2, 3, 4 });
    Dict.Add(2, new int[] { 3, 6 });
    Dict.Add(3, new int[] { 5, 6 });
    Dict.Add(4, new int[] { 2, 3 });
    Dict.Add(5, new int[] { 6 });
    Dict.Add(6, new int[] { });

    var links = GetAllLinks(1);

    foreach (var link in links)
    {
        Console.WriteLine(link.ToString());
    }

    sw.Stop();
    Console.WriteLine("MS:" + sw.ElapsedMilliseconds + " - " + hitcount);
    Console.ReadKey();
}


private static List<int> GetLinksFromKey(int key)
{
    //This usually takes avg 800ms so sleep here
    //This is the BottleNeck, the more often its called longer it will take
    Thread.Sleep(800);
    hitcount++;
    return Dict[key].ToList();
}

private static List<int> GetAllLinks(int key)
{
    var alreadyDone = new HashSet<int>();
    var workingOn = new HashSet<int>();

    var pages = new List<int>();

    RecursiveGetAllLinks(pages, alreadyDone, workingOn, key);

    return pages;
}

private static void RecursiveGetAllLinks(List<int> pages, HashSet<int> alreadyDone, HashSet<int> workingOn, int key)
{
    if (!workingOn.Add(key))
    {
        throw new Exception("Cyclic recursion for " + key);
    }

    var links = GetLinksFromKey(key);

    foreach (int link in links)
    {
        if (alreadyDone.Contains(link))
        {
            continue;
        }

        RecursiveGetAllLinks(pages, alreadyDone, workingOn, link);
    }

    alreadyDone.Add(key);
    pages.Add(key);

    workingOn.Remove(key);
}

I have two HashSet<>: one for the pages I've already completely parsed (with all their links) and one for the pages that I'm parsing (to check against recursion). At this time I'm using an Exception() in case of recursion, but it should be possible to return false everywhere and check everywhere against this false value in case of recursion.

This piece of code is O(N) in respect to calls to GetLinksFromKey(), so if there are a total of 6 pages, it should do the loading exactly N times.

Note that this code is recursive... Recursion is a two edged sword... Normally there shouldn't be problems of StackOverflowException on something like this (you won't have 10000 deep links)

Upvotes: 1

Related Questions