nick
nick

Reputation: 862

why multi-thread cant improve a mmap task?

I have a big task, which need to read 500 files (50G in total).

for every file, i should read it out, and do some calculation according to data from file. just calculate, nothing else. and i can ensure tasks are independent, just share some signleton object to read(i think that wont be the problem).

currently, i use mmap to get the file content's start pointer, and loop to calculate.

in single thread, i run the task, cost 30s,

i run it in a thread_pool, it cost me 35s(6 thread).

my machine is a 16G memory, 2.2G hz cpu with 8 thread.

I try a lot of setting, and carefully ensure the independent of tasks.

I am not so good at hardware, is there a hard limit about IO, that limit my speed? can anyone remind me is there anything i can read?

sorry, the code is too complex, i cant make a valid demo here.

Upvotes: 0

Views: 295

Answers (1)

Lothar
Lothar

Reputation: 13067

You can try to use the MAP_POPULATE flag on mmap to read ahead if you want to load the whole file or use madvise.

The most important hardware detail here is not mentioned, if you read from SSD or HDD but i assume you use a SSD, otherwise the thread pool code would be much much slower.

I don't understand why you use mmaping here. There are only three valid reasons to mmap a file, first the data structure on disk is complex and you like to poke around, which is slow as it makes read ahead much less efficient. You need shared memory between processes. Or you work on huge files and need the OS functionality to swap out data to the file when your system comes under memory stress (all databases just do it for only this single reason).

Upvotes: 1

Related Questions