Reputation: 1414
My platform is windows vista 32, with visual c++ express 2008 .
for example:
if i have a file contains 4000 bytes, can i have 4 threads read from the file at same time? and each thread access a different section of the file.
thread 1 read 0-999, thread 2 read 1000 - 2999, etc.
please give a example in C language.
Upvotes: 18
Views: 53236
Reputation: 167
std::mutex mtx;
void worker(int n)
{
mtx.lock();
char * memblock;
ifstream file ("D:\\test.txt", ios::in);
if (file.is_open())
{
memblock = new char [1000];
file.seekg (n * 999, ios::beg);
file.read (memblock, 999);
memblock[999] = '\0';
cout << memblock << endl;
file.close();
delete[] memblock;
}
else
cout << "Unable to open file";
mtx.unlock();
}
int main()
{
vector<std::thread> vec;
for(int i=0; i < 3; i++)
{
vec.push_back(std::thread(&worker,i));
}
std::for_each(vec.begin(), vec.end(), [](std::thread& th)
{
th.join();
});
return 0;
}
Upvotes: 0
Reputation: 43
The easiest way is to open the file within each parallel instance, but just open it as readonly.
The people who say there may be an IO bottleneck are probably wrong. Any modern operating system caches file reads. Which means the first time you read a file will be the slowest, and any subsequent reads will be lightning fast. A 4000 byte file can even rest inside the processor's cache.
Upvotes: 3
Reputation: 12008
If you don't write to them, no need to take care of sync / race condition.
Just open the file with shared reading as different handles and everything would work. (i.e., you must open the file in the thread's context instead of sharing same file handle).
#include <stdio.h>
#include <windows.h>
DWORD WINAPI mythread(LPVOID param)
{
int i = (int) param;
BYTE buf[1000];
DWORD numread;
HANDLE h = CreateFile("c:\\test.txt", GENERIC_READ, FILE_SHARE_READ,
NULL, OPEN_EXISTING, 0, NULL);
SetFilePointer(h, i * 1000, NULL, FILE_BEGIN);
ReadFile(h, buf, sizeof(buf), &numread, NULL);
printf("buf[%d]: %02X %02X %02X\n", i+1, buf[0], buf[1], buf[2]);
return 0;
}
int main()
{
int i;
HANDLE h[4];
for (i = 0; i < 4; i++)
h[i] = CreateThread(NULL, 0, mythread, (LPVOID)i, 0, NULL);
// for (i = 0; i < 4; i++) WaitForSingleObject(h[i], INFINITE);
WaitForMultipleObjects(4, h, TRUE, INFINITE);
return 0;
}
Upvotes: 29
Reputation: 24328
Windows supports overlapped I/O, which allows a single thread to asynchronously queue multiple I/O requests for better performance. This could conceivably be used by multiple threads simultaneously as long as the file you are accessing supports seeking (i.e. this is not a pipe).
Passing FILE_FLAG_OVERLAPPED
to CreateFile()
allows simultaneous reads and writes on the same file handle; otherwise, Windows serializes them. Specify the file offset using the Offset
and OffsetHigh
members of the OVERLAPPED
structure.
For more information see Synchronization and Overlapped Input and Output.
Upvotes: 4
Reputation: 179779
There's not even a big problem writing to the same file, in all honesty.
By far the easiest way is to just memory-map the file. The OS will then give you a void* where the file is mapped into memory. Cast that to a char[], and make sure that each thread uses non-overlapping subarrays.
void foo(char* begin, char*end) { /* .... */ }
void* base_address = myOS_memory_map("example.binary");
myOS_start_thread(&foo, (char*)base_address, (char*)base_address + 1000);
myOS_start_thread(&foo, (char*)base_address+1000, (char*)base_address + 2000);
myOS_start_thread(&foo, (char*)base_address+2000, (char*)base_address + 3000);
Upvotes: 6
Reputation: 64404
As others have noted already, there is no inherent problem in having multiple threads read from the same file, as long as they have their own file descriptor/handles. However, I'm a little curious about your motives. Why do you want to read a file in parallell? If you're only reading a file into memory, your bottleneck is likely the disk itself, in which case multiple thread won't help you at all (it'll just clutter your code).
And as always when optimizing, you should not attempt it until you (1) have a easy to understand, working, solution, and (2) you've measured your code to know where you should optimize.
Upvotes: 3
Reputation: 46754
Reading: No need to lock the file. Just open the file as read only or shared read
Writing: Use a mutex to ensure the file is only written to by one person.
Upvotes: 0
Reputation: 9410
It is possible though i'm not sure it will be worth the effort. Have you considered reading the entire file into memory within a single thread and then allow multiple threads to access that data?
Upvotes: 1
Reputation: 264331
I don't see any real advantage to doing this.
You may have multiple threads reading from the device but your bottleneck will not be CPU but rather disk IO speed.
If you are not careful you may even slow the processes down (but you will need to measure it to know for certain).
Upvotes: 2
Reputation: 7324
You shouldn't need to do anything particularly clever if all they're doing is reading. Obviously you can read it as many times in parallel as you like, as long as you don't exclusively lock it. Writing is clearly another matter of course...
I do have to wonder why you'd want to though - it will likely perform badly since your HDD will waste a lot of time seeking back and forth rather than reading it all in one (relatively) uninterrupted sweep. For small files (like your 4000 line example) where that might not be such a problem, it doesn't seem worth the trouble.
Upvotes: 1
Reputation: 3992
He wants to read from a file in different threads. I guess that should be ok if the file is opened as read-only by each thread.
I hope you don't want to do this for performance though, since you will have to scan large parts of the file for newline characters in each thread.
Upvotes: -1
Reputation: 1126
You need a way to sync those threads. There're different solutions to mutex http://en.wikipedia.org/wiki/Mutual_exclusion
Upvotes: -1
Reputation: 113
You can certainly have multiple threads reading from a data structure, race conditions can potentially occur if any writing is taking place.
To avoid such race conditions you need to define the boundaries that threads can read, if you have an explicit number of data segments and an explicit number of threads to match these then that is easy.
As for an example in C you would need to provide some more information, like the threading library you are using. Attempt it first, then we can help you fix any issues.
Upvotes: 2