Reputation: 287
I have an Item
object with a property called generator_list
(hashset of strings). I have 8000 objects, and for each object, I'd like to see how it's generator_list
intersects with every other generator_list
, and then I'd like to store the intersection number in a List<int>
, which will have 8000 elements, logically.
The process takes about 8 minutes, but only a few minutes with parallel processing, but I don't think I'm doing the parallel part right, hence the question. Can anyone please tell me if and how I need to modify my code to take advantage of the parallel loops?
The code for my Item
object is:
public class Item
{
public int index { get; set; }
public HashSet<string> generator_list = new HashSet<string>();
}
I stored all my Item objects in a List<Item> items
(8000 elements). I created a method that takes in items (the list I want to compare) and 1 Item (what I want to compare to), and it's like this:
public void Relatedness2(List<Item> compare, Item compare_to)
{
int compare_to_length = compare_to.generator_list.Count;
foreach (Item block in compare)
{
int block_length = block.generator_list.Count;
int both = 0; //this counts the intersection number
if (compare_to_length < block_length) //to make sure I'm looping
//over the smaller set
{
foreach (string word in compare_to.generator_list)
{
if (block.generator_list.Contains(word))
{
both = both + 1;
}
}
}
else
{
foreach (string word in block.generator_list)
{
if (compare_to.generator_list.Contains(word))
{
both = both + 1;
}
}
}
// I'd like to store the intersection number, both,
// somewhere so I can effectively use parallel loops
}
}
And finally, my Parallel forloop is:
Parallel.ForEach(items, (kk, state, index) => Relatedness2(items, kk));
Any suggestions?
Upvotes: 0
Views: 1442
Reputation: 3026
If your Item's index is contiguous and starts at 0, you don't need the Item class at all. Just use a List< HashSet< < string>>, it'll take care of indexes for you. This solution finds the intersect count between 1 item and the others in a parallel LINQ. It then takes that and runs it on all items of your collection in another parallel LINQ. Like so
var items = new List<HashSet<string>>
{
new HashSet<string> {"1", "2"},
new HashSet<string> {"2", "3"},
new HashSet<string> {"3", "4"},
new HashSet<string>{"1", "4"}
};
var intersects = items.AsParallel().Select( //Outer loop to run on all items
item => items.AsParallel().Select( //Inner loop to calculate intersects
item2 => item.Intersect(item2).Count())
//This ToList will create a single List<int>
//with the intersects for that item
.ToList()
//This ToList will create the final List<List<int>>
//that contains all intersects.
).ToList();
Upvotes: 0
Reputation: 302
Maybe something like this
public Dictionary<int, int> Relatedness2(IList<Item> compare, Item compare_to)
{
int compare_to_length = compare_to.generator_list.Count;
var intersectionData = new Dictionary<int, int>();
foreach (Item block in compare)
{
int block_length = block.generator_list.Count;
int both = 0;
if (compare_to_length < block_length)
{
foreach (string word in compare_to.generator_list)
{
if (block.generator_list.Contains(word))
{
both = both + 1;
}
}
}
else
{
foreach (string word in block.generator_list)
{
if (compare_to.generator_list.Contains(word))
{
both = both + 1;
}
}
}
intersectionData[block.index] = both;
}
return intersectionData;
}
And
List<Item> items = new List<Item>(8000);
//add to list
var dictionary = new ConcurrentDictionary<int, Dictionary<int, int>>();//thread-safe dictionary
var readOnlyItems = items.AsReadOnly();// if you sure you wouldn't modify collection, feel free use items directly
Parallel.ForEach(readOnlyItems, item =>
{
dictionary[item.index] = Relatedness2(readOnlyItems, item);
});
I assumed that index unique.
i used a dictionaries, but you may want to use your own classes in my example you can access data in following manner
var intesectiondata = dictionary[1]//dictionary of intersection for item with index 1
var countOfintersectionItemIndex1AndItemIndex3 = dictionary[1][3]
var countOfintersectionItemIndex3AndItemIndex7 = dictionary[3][7]
don't forget about possibility dictionary[i] == null
Upvotes: 2
Reputation: 41
Thread safe collections is probably what you are looking for http://msdn.microsoft.com/en-us/library/dd997305(v=vs.110).aspx.
When working in multithreaded environment, you need to make sure that you are not manipulating shared data at the same time without synchronizing access.
the .NET Framework offers some collection classes that are created specifically for use in concurrent environments, which is what you have when you're using multithreading. These collections are thread-safe, which means that they internally use synchronization to make sure that they can be accessed by multiple threads at the same time.
Source: Programming in C# Exam Ref 70-483, Objective 1.1: Implement multhitreading and asynchronous processing, Using Concurrent collections
Which are the following collections
BlockingCollection<T>
ConcurrentBag<T>
ConcurrentDictionary<T>
ConcurentQueue<T>
ConcurentStack<T>
Upvotes: 0