boop
boop

Reputation: 7787

How to use a NodeJS Stream twice?

I've a readable NodeJS Stream which I want to use twice. Disclaimer: I'm not very comfortable with streams

Why?

My Service allows uploading of images for users. I want to avoid uploading of the same images.

My workflow is as follows:

upload image per ajax
get hash of image
if hash in database
  return url from database
else
  pass hash to resize&optimize pipeline
  upload image to s3 bucket
  get hash of image and write it to database with url
  return s3 url

I get the hash of my stream with hashstream and optimize my image with gm.

Hashstream takes a stream, closes it, creates a hash and returns it with a callback.

My question is: What would be the best approach to combine both methods?

Upvotes: 6

Views: 4650

Answers (1)

Michał Karpacki
Michał Karpacki

Reputation: 2658

There are two ways to solve it:

  • Buffer the stream

    Since you don't know if your stream will be used again, you can simply buffer it up somehow (somehow meaning handling data events, or using some module, for example accum). As soon as you know what the outcome of the hash function you'd simply write the whole accumulated buffer into the gm stream.

  • Use stream.pipe twice to "tee"

    You probably know the posix command tee, likewise you can push all the data into two places. Here's an example implementation of a tee method in my "scramjet" stream, but I guess for you it'd be quite sufficient to simply pipe twice. Then as soon as you get your hash calculated and run into the first condition I'd simply send an end.

The right choice depends on if you want to conserve memory or CPU. For less memory use two pipes (your optimization process will start, but you'll cancel it before it would output anything). For less CPU and less processes usage I'd go for buffering.

All in all I would consider buffering only if you can easily scale to more incoming images or you know exactly how much load there is and you can handle it. Either way there will be limits and these limit need to be somehow handled, if you can start couple more instances then you should be better of with using more CPU and keeping the memory at a sensible level.

Upvotes: 5

Related Questions