Reputation: 139
I'm creating an application where I want to rename a bunch of files in a bunch of folders based on the content of an xslx file.
I'm making a parallel for loop where each folder get's it's own "thread" (or whatchamacallit), the application should then, based on the folder name, retrieve all the posts in the xlsx file with the corresponding folder name and rename the contents of the folder based on what it got from the xlsx file. I hope this makes sense.
My question is: When should I read the xlsx file? As I see it I have two options: 1) Open the file before the parallel iteration and have each iteration loop through the file's contents looking for the folder name. Possible issues could be that multiple threads will be checking the same array simultaneously. I don't know if that could fudge stuff up. 2) Open the file once for each iteration and loop through to find results. I'm thinking that opening the file multiple times would instead be more time consuming than it has to be.
The xlsx file has about 48000 rows of data.
EDIT:
I've dropped the parallel for loop and gone with a regular one due to comments and answers advising me to and explaining why. But I'll leave it in the question for others to find.
The question is now: When should I open the xlsx file? (for details see pre-edit)
Upvotes: 2
Views: 2406
Reputation: 117175
I ran some tests to see what kind of performance improvement you might get, if any. I decided to create 10,000 files and, using Stopwatch
, time how long it took to rename the files. I used a single-threaded and multi-threaded approach.
Here's the code:
//var path = @"D:\Users\Enigmativity\Temporary\SOTest"; //HDD
var path = @"C:\_temporary\SOTest"; //SSD
var files = 10000;
var format = "00000";
var rnd = new Random();
Enumerable
.Range(0, files)
.OrderBy(n => rnd.NextDouble())
.ForEach(n => File.WriteAllText(System.IO.Path.Combine(path, n.ToString(format) + ".txt"), n.ToString()));
I then ran this:
var sw = Stopwatch.StartNew();
Enumerable
.Range(0, files)
.ToList()
.ForEach(n =>
System.IO.File.Move(
System.IO.Path.Combine(path, n.ToString(format) + ".txt"),
System.IO.Path.Combine(path, n.ToString(format) + n.ToString(format) + ".txt")));
sw.Stop();
And compared it to this:
var sw = Stopwatch.StartNew();
Enumerable
.Range(0, files)
.GroupBy(x => 10 * x / files)
.AsParallel()
.ForAll(ns =>
ns
.ToList()
.ForEach(n =>
System.IO.File.Move(
System.IO.Path.Combine(path, n.ToString(format) + ".txt"),
System.IO.Path.Combine(path, n.ToString(format) + n.ToString(format) + ".txt"))));
sw.Stop();
At the end of each run I cleaned up the files:
Directory.EnumerateFiles(path).ForEach(x => File.Delete(x));
My results were:
Single thread on HDD: 2,155 milliseconds
Multi-threads on HDD: 1,601 milliseconds
Single thread on SSD: 2,457 milliseconds
Multi-threads on SSD: 940 milliseconds
I ran these results numerous times and each run was roughly the same time. I got a huge benefit from the SSD running in parallel and a moderate benefit on a HDD.
Upvotes: 1
Reputation: 3333
You should not use multithreading for I/O bound operations. Even if you use really fast storage device such as SSD or RAID, you would not get much performance boost from multithreading. For regular HDDs the performance would actually get worse. Try copying multiple files or extracting multiple zip archives simultaniously, for example. You will quickly notice the performance drop, due to multiple threads constantly fighting over single I/O device.
Upvotes: 1
Reputation: 171246
Just reading from a data structure is safe to do concurrently. This is not an issue here. The issue I see is that if you don't perform any preprocessing on the list you will scan it many times from many threads which is a waste. What about this:
var excelItems = ...; //Fill this in.
var groupedbyFolder = excelItems.GroupBy(x => x.directoryName);
groupedByFolder.AsParallel().ForAll(g => ProcessFolder(g));
This traverses the data just once and is very easy clean code.
You need to configure AsParallel
to an empirically determined degree of parallelism as well. Try different values.
Upvotes: 0