Reputation: 311
Upon requesting a certain URL, I'd like to save the incoming markdown, transform it into HTML, then pass it on for display. If using an observer, I can get the channel, and set the channel's listener to my special "override" listener via nsiTraceableChannel, then pass on the data to the original listener for display, but I'm puzzled what to do at that point. The onDataAvailable method passes a nsiInputStream, which cannot be read from in javascript code. While I could wrap that in a nsiScriptableInputStream and read from that, that seems like it would introduce a lot of wrapping around a simple read operation that might repeat many times. I'd rather just read it all at once in nice enclosed binary code.
What I want to do is use NetUtils.asyncCopy to copy that input stream to a storage stream, and when that finishes transform the result of the storage stream into the stuff to pass the original listener. But wouldn't that keep calling my override listener with onDataAvailable too? The documentation says onDataAvalilable MUST read exactly that many bytes from the inputStream before returning, so I guess using nsiScriptableInputStream is mandatory? Would I just read from the scriptable input stream, then ignore and discard that, while the async copy continued in the background? Does asyncCopy replace my override listener with its own listener, which would be fine, or do they stack, which would be bad?
Ideally I'd want a thing that takes an output stream, and returns a stream listener for passing to nsiTraceableChannel.setInputStream, but I can't find anything like that, or even a list of what implements nsiStreamListener.
So, like something like this:
var {classes: Cc, interfaces: Ci, results: Cr, Constructor: CC, utils: Cu } = Components;
var hack = 3;
/*
0 = use nsiScriptableInputStream
1 = use NetUtil.asyncCopy, and then use nsiScriptableInputStream but ignore
2 = use NetUtil.asyncCopy, and it overrides our own override listener from then on
3 = use NetUtil.asyncCopy, but our own override listener keeps getting onDataAvailable, but we just ignore it
*/
var ScriptableInputStream;
if(hack == 0 || hack == 1) {
ScriptableInputStream = CC("@mozilla.org/scriptableinputstream;1","nsIScriptableInputStream", "init");
}
var StorageStream;
var NetUtil;
if(hack != 0) {
StorageStream = Cc["@mozilla.org/storagestream;1"];
Cu.import("resource://gre/modules/NetUtil.jsm");
}
function HTMLRestyler(tracingChannel) {
this.originalListener = tracingChannel.setNewListener(this);
if(hack == 0) {
this.data = "";
} else {
/* I wonder if creating one of these is as expensive as creating a
nsiScriptableInputStream for every read operation? */
this.storage = StorageStream.createInstance(Ci.nsIStorageStream);
this.storage.init(256,256,null);
this.data = this.storage.getOutputStream(0);
}
}
HTMLRestyler.prototype = {
QueryInterface: function(id)
{
if (id.equals(Components.interfaces.nsIStreamListener) ||
id.equals(Components.interfaces.nsISupportsWeakReference) ||
id.equals(Components.interfaces.nsISupports))
return this;
throw Components.results.NS_NOINTERFACE;
}
onDataAvailable: function(request, context, inputStream, offset, count)
{
if(hack == 0) {
var scriptStream = new ScriptableInputStream(inputStream);
this.data += scriptStream.read(count);
scriptStream.close();
/* the easy way (ow my CPU cycles) */
} else if(hack == 1) {
if(!this.iscopying) {
NetUtils.asyncCopy(inputStream,this.data,this.finished);
this.iscopying = true;
}
/* still have to read the data twice once in asyncCopy, once here
is there any point to doing this? */
var scriptStream = new ScriptableInputStream(inputStream);
var ignored = scriptStream.read(count);
scriptStream.close();
} else if(hack == 2) {
NetUtils.asyncCopy(inputStream,this.data,this.finished);
/* the "best" way
(probably doesn't work)
onDataAvailable and onStopRequest no longer called from here on,
as this listener has been overridden */
} else if(hack == 3) {
if(!this.iscopying) {
NetUtils.asyncCopy(inputStream,this.data,this.finished);
this.iscopying = true;
}
/* but no scriptable input stream needed because it's ok to just ignore
the inputStream here in the override listener and not read data*/
}
}
onStartRequest: function(request, context) {
this.request = request;
this.context = context;
},
onStopRequest: function(request, context, statusCode) {
if(hack != 2) {
this.finished(statusCode);
}
},
finished: function(status) {
this.originalListener.onStartRequest(this.request,this.context);
if(hack != 0) {
var scriptStream = new ScriptableInputStream(this.storage.newInputStream(1));
this.data = scriptStream.read(1000);
this.storage.close();
}
this.originalListener.onDataAvailable(this.transform(this.data));
this.originalListener.onStopRequest(this.request, this.context, status);
},
transform: function(data) {
return "derp "+ data;
}
}
Upvotes: 0
Views: 179
Reputation: 33162
As you pointed out yourself already, onDataAvailable
by contract must consume all data. Hence the async APIs won't suffice.
This leaves sync APIs.
nsIStorageStream
or nsIPipe
to store the data until complete, and get js-string then.nsIScriptableInputStream
and concatenate into a js-stringnsIBinaryInputStream
and concatenate into a js-string, octets or into an ArrayBuffer.I experimented quite a bit with various ways to efficiently consume data in onDataAvailable
for DownThemAll!. In my use case, it was best to use .writeFrom
on the output stream end of an nsIPipe
, which doesn't require to pull the data out of C++ into JS land first.
However, your case may differ: You need to actually modify/transform the data, so you need a js-string anyway for the actually transformation. Storing data in some XPCOM stream like nsIStorageStream
or nsIPipe
will still have you read the entire thing into a js-stream in the end, modify it, and put it back into another stream that you can pass on to the next onDataAvailable
listener down the chain. This means, you have additional memory overhead (storage stream and js-string instead of just a js-string) while actually saving only very, very few XPCOM overhead.
Same thing with array buffers.
So in the end, given your use case, I'd argue for concatenating the received data into a js-string directly. However, you should measure timings and memory use yourself for the various options and decide then.
What has more likely a bigger impact, in particular for memory use, of course, would be writing a stateful parser/transformer that does not need to cache the whole response first, but transforms as you go.
Upvotes: 2