thatandrey
thatandrey

Reputation: 287

Parallel loop in c#, accessing the same variable

I have an Item object with a property called generator_list (hashset of strings). I have 8000 objects, and for each object, I'd like to see how it's generator_list intersects with every other generator_list, and then I'd like to store the intersection number in a List<int>, which will have 8000 elements, logically.

The process takes about 8 minutes, but only a few minutes with parallel processing, but I don't think I'm doing the parallel part right, hence the question. Can anyone please tell me if and how I need to modify my code to take advantage of the parallel loops?

The code for my Item object is:

public class Item
{
    public int index { get; set; }
    public HashSet<string> generator_list = new HashSet<string>();
}

I stored all my Item objects in a List<Item> items (8000 elements). I created a method that takes in items (the list I want to compare) and 1 Item (what I want to compare to), and it's like this:

public void Relatedness2(List<Item> compare, Item compare_to)
        {
            int compare_to_length = compare_to.generator_list.Count;
            foreach (Item block in compare)
            {
                int block_length = block.generator_list.Count;
                int both = 0; //this counts the intersection number
                if (compare_to_length < block_length) //to make sure I'm looping  
                                                      //over the smaller set
                {
                    foreach (string word in compare_to.generator_list)
                    {
                        if (block.generator_list.Contains(word))
                        {
                            both = both + 1;
                        }
                    }
                }
                else
                {
                    foreach (string word in block.generator_list)
                    {
                        if (compare_to.generator_list.Contains(word))
                        {
                            both = both + 1;
                        }
                    }
                }
                     // I'd like to store the intersection number, both,   
                     // somewhere so I can effectively use parallel loops
            }

        }

And finally, my Parallel forloop is:

Parallel.ForEach(items, (kk, state, index) => Relatedness2(items, kk));

Any suggestions?

Upvotes: 0

Views: 1442

Answers (3)

Guillaume CR
Guillaume CR

Reputation: 3026

If your Item's index is contiguous and starts at 0, you don't need the Item class at all. Just use a List< HashSet< < string>>, it'll take care of indexes for you. This solution finds the intersect count between 1 item and the others in a parallel LINQ. It then takes that and runs it on all items of your collection in another parallel LINQ. Like so

var items = new List<HashSet<string>>
{
    new HashSet<string> {"1", "2"},
    new HashSet<string> {"2", "3"},
    new HashSet<string> {"3", "4"},
    new HashSet<string>{"1", "4"}
};


var intersects = items.AsParallel().Select(     //Outer loop to run on all items
    item => items.AsParallel().Select(          //Inner loop to calculate intersects
            item2 => item.Intersect(item2).Count())
            //This ToList will create a single List<int>
            //with the intersects for that item
            .ToList() 
        //This ToList will create the final List<List<int>>
        //that contains all intersects.
        ).ToList();

Upvotes: 0

Roman G.
Roman G.

Reputation: 302

Maybe something like this

 public Dictionary<int, int> Relatedness2(IList<Item> compare, Item compare_to)
        {
            int compare_to_length = compare_to.generator_list.Count;
            var intersectionData = new Dictionary<int, int>();
            foreach (Item block in compare)
            {
                int block_length = block.generator_list.Count;
                int both = 0;
                if (compare_to_length < block_length)
                {
                    foreach (string word in compare_to.generator_list)
                    {
                        if (block.generator_list.Contains(word))
                        {
                            both = both + 1;
                        }
                    }
                }
                else
                {
                    foreach (string word in block.generator_list)
                    {
                        if (compare_to.generator_list.Contains(word))
                        {
                            both = both + 1;
                        }
                    }
                }
                intersectionData[block.index] = both;
            }
            return intersectionData;
        }

And

          List<Item> items = new List<Item>(8000);
        //add to list
        var dictionary = new ConcurrentDictionary<int, Dictionary<int, int>>();//thread-safe dictionary

        var readOnlyItems = items.AsReadOnly();// if you sure you wouldn't modify collection, feel free use items directly
        Parallel.ForEach(readOnlyItems, item =>
        {
            dictionary[item.index] = Relatedness2(readOnlyItems, item);
        });

I assumed that index unique.

i used a dictionaries, but you may want to use your own classes in my example you can access data in following manner

var intesectiondata = dictionary[1]//dictionary of intersection for item with index 1

var countOfintersectionItemIndex1AndItemIndex3 = dictionary[1][3]
var countOfintersectionItemIndex3AndItemIndex7 = dictionary[3][7]

don't forget about possibility dictionary[i] == null

Upvotes: 2

Ken Spur
Ken Spur

Reputation: 41

Thread safe collections is probably what you are looking for http://msdn.microsoft.com/en-us/library/dd997305(v=vs.110).aspx.

When working in multithreaded environment, you need to make sure that you are not manipulating shared data at the same time without synchronizing access.

the .NET Framework offers some collection classes that are created specifically for use in concurrent environments, which is what you have when you're using multithreading. These collections are thread-safe, which means that they internally use synchronization to make sure that they can be accessed by multiple threads at the same time.

Source: Programming in C# Exam Ref 70-483, Objective 1.1: Implement multhitreading and asynchronous processing, Using Concurrent collections

Which are the following collections

  • BlockingCollection<T>
  • ConcurrentBag<T>
  • ConcurrentDictionary<T>
  • ConcurentQueue<T>
  • ConcurentStack<T>

Upvotes: 0

Related Questions