Reputation: 257
I have a process which writes huge data over the network. Let's say it runs on machine A and dumps around 70-80GB of file on machine B over NFS. After process 1 finishes and exits, my process 2 runs of machine A and fetches this file from machine B over NFS. The bottleneck in the entire cycle is the writing and reading of this huge data file. How can I reduce this I/O time? Can I somehow keep the data loaded in the memory, ready to use by process 2 even after process 1 has exited?
I'd appreciate ideas on this. Thanks.
Edit: since the process 2 'reads' the data directly from the network, would it be better to copy the data locally first and then read from the local disk? I mean would (read time over network) > (cp to local disk) + (read from local disk)
Upvotes: 3
Views: 479
Reputation: 215257
Whether you use mmap
or plain read
/write
should make little difference; either way, everything happens through the filesystem cache/buffers. The big problem is NFS. The only way you can make this efficient is by storing the intermediate data locally on machine A rather than sending it all over the network to machine B only to pull it back again right afterwards.
Upvotes: 1
Reputation: 1449
There is a lot of network and IO overhead with this approach. So you may not be able to reduce the latency further down.
Upvotes: 1
Reputation: 919
Upvotes: 0
Reputation: 393064
Use tmpfs to leverage memory as (temporary) files.
Use mbuffer with netcat to simply relay from one port to another without storing the intermediate stream, but still allowing streaming to occur at varying speeds:
machine1:8001 -> machine2:8002 -> machine3:8003
At machine2 configure a job like:
netcat -l -p 8002 | mbuffer -m 2G | netcat machine3 8003
This will allow at most 2 gigs of data to be buffered. If the buffer is filled 100%, machine2 will just start blocking reads from machine1, delaying the output stream without failing.
When machine1 had completed transmission, the second netcat
will stay around till the mbuffer is depleted
Upvotes: 0
Reputation: 37437
If you want to keep the data loaded in memory, then you'll need 70-80 GB of RAM.
The best is maybe to attach a local storage (hard disk drive) to system A to keep this file locally.
Upvotes: 2
Reputation: 45083
The obvious answer is to reduce network writes - which seems could save you time on an exponential scale and improve reliability - there seems very little point in copying any file to another machine only to copy it back, so in order to answer your questions more precisely we will need more information.
Upvotes: 1