frank-dspeed
frank-dspeed

Reputation: 1112

How to write a single file while reading from multiple input streams in NodeJS

How to write a single file while reading from multiple input streams of the exact same file from diffrent locations with NodeJS.

As its still not Clear Maybe?

I want to use more performance for the download lets say we have 2 locations for the same file each can perform only 10mb down stream so i want to download a part from the first location and the secund in parallel. to get it with 20mb.

so both streams need to get joined some how and both streams need to know the range they are downloading.

i have 2 examples

var http = require('http')
var fs = require('fs')

// will write to disk __dirname/file1.zip
function writeFile(fileStream){
  //...
}
// This example assums downloading from 2 http locations
http.request('http://location1/file1.zip').pipe(writeFile)
http.request('http://location2/file1.zip').pipe(writeFile)
var fs = require('fs')

// will write to disk __dirname/file1.zip
function writeFile(fileStream){
  //...
}

// this example is reading the same file from 2 diffrent disks
fs.readfFile('/mount/volume1/file1.zip').pipe(writeFile)
fs.readfFile('/mount/volume2/file1.zip').pipe(writeFile)

How i think that it would work

ReadStream needs to check if a defined content range is already writen befor rereading the next chunk from each file and maybe they should start in on a random location in the file to read.

if the total file content length is X we will divide it into smaller chunks and create a map where each entry has a fixed content length so we know what parts we got and what parts we are downloading in total.

Trying to answer this question my self

We can try to simply optimistic raise Read

let SIZE = 64; // 64 byte intervals
let buffers = []
let bytesRead = 0  

function readParallel(filepath,callback){
fs.open(filepath, 'r', function(err, fd) {
  fs.fstat(fd, function(err, stats) {
    let bufferSize = stats.size;


    while (bytesRead < bufferSize) {
      let size = Math.min(SIZE, bufferSize - bytesRead);
      let buffer = new Buffer(size),
      let position = bytesRead
      let length = size
      let offset = bytesRead


      let read = fs.readSync(fd, buffer, offset, length, position);
      buffers.push(buffer);  
      bytesRead += read;
    }

  });
});
}
// At the End: buffers.concat() ==== "File Content"

fs.createReadStream() has an option you can pass it to specify the start

let f = fs.createReadStream("myfile.txt", {start: 1000});

You could also open a normal file descriptor with fs.open(), then fs.read() one byte from a position right before where you want the stream to be positioned using the position argument to fs.read() and then you can pass that file descriptor into fs.createReadStream() as an option and the stream will start with that file descriptor and position (though obviously the start option to fs.createReadStream() is a bit simpler).

Upvotes: 1

Views: 1681

Answers (2)

frank-dspeed
frank-dspeed

Reputation: 1112

Range Locking

The Answer is Advisory Locking it is as simple as Torrent does it

  1. assign the whole file or a part of it to multiple smaller parts
  2. lock the file range and fetch that range from a list of sources.
  3. use the file created in part 1 as driver for a FIFO Queue it contains all meta

To get a File from Multiple Sources a JS Implementation would look like if we assume all files are only i put no error handling in here

const queue = [];
const sources = ['https://example.com/file','https://example1.com/file'];
const fileSize = fetch({sources[0],{method: 'HEAD'}).then(({ headers })=>headers['Content-Size']);

const targetBuffer = new UInt8Array(fileSize);
const charset = 'x-user-defined';

// Maps to the UTF Private Address Space Area so you can get bits as chars
const binaryRawEnablingHeader = `text/plain; charset=${charset}`;

const requestDefaults = {
  headers: { 
    'Content-Type': binaryRawEnablingHeader,
    'range': 'bytes=2-5,10-13'
  }
}

const downloadPlan = /* some logic that puts that bytes into the target WiP */
// use response.text() and then convert that to byte via
// UNICODE Private Area 0xF700-0xF7ff.
const convertToAbyte = (chars) => 
  new Array(chars.length)
    .map((_abyte,offset) => 
        chars.charCodeAt(offset) & 0xff);

Upvotes: 0

Avraham
Avraham

Reputation: 938

Using csv-parse with csv-stringify from the CSV Project.

const fs = require('fs');
const parse = require('csv-parse');
const stringify = require('csv-stringify')

const stringifier = stringify();
const writeFile = fs.createWriteStream('out.csv');

fs.createReadStream('file1.csv').pipe(parse()).pipe(stringifier).pipe(writeFile);
fs.createReadStream('file2.csv').pipe(parse()).pipe(stringifier).pipe(writeFile);

Here I parse each file separately (using a different parse stream for each source), then pipe both to the same stringify stream which concatenates them, then write to destination.

Upvotes: 0

Related Questions