Rauno Veberson
Rauno Veberson

Reputation: 139

How to optimize nested looping or the method that is executed in every iteration?

I understand that the title of this question doesn't say much about the problem I'm struggling with. I have a text file filled with purchase orders from a online bookstore. This text file is about 900,000 lines long and each line contains two fields separated by comma (customer_id,book_id). I wanted to do some datamining and thought it would be fun to find out some statistics about books so I created two methods. GetOrderCount(string x, string y) and AllPairs(). First one calculates how many customers bought two specific books together and the second one calculates all possible pairs (all size 2 combinations). However this takes very long time to run. Looking at the code is there something specific that could take a long time? And is the nested loop in AllPairs() complex enough that it would justify using parallel For? Also I picked some of the structures so that it would make better sense but they might not be meant for these kind of operations. Any pointers towards why this code is so slow would be perfection.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace BookStats
{
class Order
{
    Dictionary<int, HashSet<String>> orders;
    List<string> books;

    public Order(String path)
    {
        orders = GetOrders(path, out books);
    }

    private Dictionary<int, HashSet<string>> GetOrders(string path, out List<string> distinctBooks)
    {
        Dictionary<int, HashSet<string>> items = new Dictionary<int, HashSet<string>>();
        distinctBooks = new List<string>();
        List<string> allBooks = new List<string>();
        using (StreamReader sr = File.OpenText(path))
        {
            string s = String.Empty;
            while ((s = sr.ReadLine()) != null)
            {
                string[] line = s.Split(',');
                try
                {
                    int id = int.Parse(line[0]);
                    allBooks.Add(line[1]);
                    if (items.ContainsKey(id))
                    {
                        items[id].Add(line[1]);
                    }
                    else
                    {
                        HashSet<string> customerBooks = new HashSet<string>();
                        customerBooks.Add(line[1]);
                        items.Add(id, customerBooks);
                    }
                }
                catch{ }
            }
        }
        distinctBooks.AddRange(allBooks.Distinct());
        return items;
    }

    public int GetOrderCount(string x, string y){
        int count = 0;
        foreach (KeyValuePair<int,HashSet<string>> order in orders)
        {
            var receipt = order.Value;
            if (receipt.Contains(x) && receipt.Contains(y))
            {
                count++;
            }
        }
        return count;
    }

    public void GetAllPairs()
    {
        Stopwatch watch = new Stopwatch();
        watch.Start();
        for (int i = 0; i < books.Count; i++)
        {
            for (int j = i+1; j < books.Count;j++)
            {
                int count = GetOrderCount(books[i], books[j]);
                Console.WriteLine(j);

            }
            Console.WriteLine(watch.Elapsed);
        }
    }

    public int GetBookCount() {
        return books.Count;
    }

    public void GetCustomerPurchase(int id)
    {
        foreach (string s in orders[id])
        {
            System.Console.WriteLine("Raamat " + s);
        }
    }



}

}

EDITED: Edited the code to match suggestions given by @Chris and @Anony-Mousse

Upvotes: 0

Views: 69

Answers (1)

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77464

Your loops are actually four levels deep (third loop is in “GetOrdersCount” and the fourth is “Contains”). Thats probably what makes it slow. Use a profiler to see where you need to optimize.

For a starter, replace

Dictionary<int, List<String>> orders;

with

Dictionary<int, Set<String>> orders;

And do the necessary changes to the code.

Build optimized data structures such as inverted sorted lists to accelerate the costly operations. A set is faster than a list for “Contains” for example, too.

Upvotes: 1

Related Questions