jsstuball
jsstuball

Reputation: 4911

Number of threads not affecting disk read rate?

I am dumbfounded by the results of reading 4 very large CSV files into a dataframe in Python: I performed the read with a single thread in series, i.e. read the first CSV, then the second etc. It took 230s.

With 4 threads, one thread reading one CSV, in "parallel" it takes 220s, and with 2 threads it takes 220s.

I can't explain this because this suggests no integer number of disk read heads which makes sense; if there is a single head then both the 2 and 4-threaded version of the program would take significantly longer due to the read head constantly moving between addresses as threads are switched. If it was 2 or 4 read heads then surely both of the multi-threaded versions would outperform the single threaded version?

Upvotes: 1

Views: 63

Answers (1)

Jutorres
Jutorres

Reputation: 126

The access to the disk is managed by the OS, so if you are trying to read in parallel from the same disk you won't get a real improvement. I'm not really sure about having several read heads, but in the case that the files are in differents disks it will.

Anyways you can find more info here. multithread read from disk

Hope this helps.

Upvotes: 2

Related Questions