Conrad Clark
Conrad Clark

Reputation: 4536

Response.Write hangs when I try to read from request and write to response at the same time

Problem

This has been driving me insane for days. We have a process here to import records from a csv file to the database, through a admin page which resides in a ASP.NET Web Forms (.NET 4.0) project. The process itself was too slow and I was responsible to make it faster. I started by changing the core logic, which gave a good performance boost.

But if I upload large files (well relatively large, about 3 MB tops), I have to wait until the upload process finishes until I start importing, and I don't return any progress to the client while I do so. The process itself is not that long, it takes about 5 to 10 seconds to complete, and yes, I've considered creating a separate Task and polling the server, but I thought it was overkill.


What have I done so far?

So, to fix this issue, I've decided to read the incoming stream and import the values while i'm reading. I created a generic handler (.ashx), and put the following code inside void ProcessRequest(HttpContext context):

using (var stream = context.Request.GetBufferlessInputStream())
{
}

First I remove the headers, then I read the stream (through a StreamReader) until I get a CRLF, convert the line to my model object, and keep reading the csv. When I get 200 records or so, I bulk update all of them to the database. Then, I keep getting more records until I end the file or get 200 records.

This seems to be working, but then, I've decided to stream my response as well. First, I disabled BufferOutput:

context.Response.BufferOutput = false;

Then, I added those headers to my response:

context.Response.AddHeader("Keep-Alive", "true");
context.Response.AddHeader("Cache-Control", "no-cache");
context.Response.ContentType = "application/X-MyUpdate";

Then, after sending those 200 records to the database, I write a response:

response.Write(s);
response.Flush();

s is a string with a fixed size of 256 chars. I know 256 chars doesn't always equate 256 bytes, but I was just being sure I wouldn't write walls of text to the response and mess something up.

Here's its format:

| pipeline (record delimiter)
1 or 0 success or failure
; delimiter
error message (if applicable)
| pipeline (next demiliter and so on)

Example:

|0;"Invalid price on line 123"|1;|1;|0;"Invalid id on line 127"|

On the client side, here's what I have (just the request part):

function import(){
    var formData = new FormData();
    var file = $('[data-id=file]')[0].files[0];
    formData.append('file', file);

    var xhr = new XMLHttpRequest();
    var url = "/Site/Update.ashx";

    xhr.onprogress = updateProgress;

    xhr.open('POST', url, true);
    xhr.setRequestHeader("Content-Type", "multipart/form-data");
    xhr.setRequestHeader("X-File-Name", file.name);
    xhr.setRequestHeader("X-File-Type", file.type);
    xhr.send(formData);
}

function updateProgress(evt){
    debugger;
}

What happened :(

I assume that if I write Response.Flush it's not 100% guaranteed to go to the client, correct? Or is the client itself the problem? If it is the client, why does the server hang when I call Response.Write too much?

EDIT: As an addendum, if I throw the same piece of code into a aspx page, it works. So I believe it has something to do with the xhr (XMLHttpRequest) itself, which is not prepared to process streaming data, it seems.

I'll be glad to give more information if needed.

Upvotes: 2

Views: 1957

Answers (1)

Conrad Clark
Conrad Clark

Reputation: 4536

After another day bashing my head on this one, I think I finally got it. For those interested, I'm going to post the answer here.

First of all, I told my intention was to read the stream and process the csv file at the same time, right?

using (var stream = context.Request.GetBufferlessInputStream())
{
}

The first problem I didn't account for was the fact that I did everything in a synchronous fashion. So I would only continue reading the stream as I processed the files. While it makes sense, it's not optimal since I can read the file faster than I can analyze and update the data.

However, the issue lies in hanging the upload process because of this. I tried to write using Response.Write before I read the whole csv file. Long story short, I was trying to send a response before I got my request completely.

I'm not sure what's the expected behavior for Response.Write when it's executed before the whole request is read, but something tells me it's impossible for it to send information to the client at the same time the client is sending information for the server, unless I had some kind of full-duplex connection. I saw this question a few hours ago "HTTP pipelining - concurrent responses per connection", and despite it doesn't answer my question, the picture made me curious if the response could happen together with the request.

Then I found this link randomly, apparently from the working group who was charged with maintaining and developing the "core" specifications for HTTP: Can the response entity be transmitted before all the request entity has been read?, which basically says:

Can the response entity be transmitted before all the request entity has been read?

I have been implementing an HTTP/1.1 server which consumes the request entity lazily, as need by the application.

If the application decides that it can generate some or all of the response before it finishes reading the whole request entity, is that allowed?

What is the answer if the status is not an error code. Can the server begin transmitting the response entity before all of the request entity has been read? (Assume that the server is intelligent enough to avoid deadlock by always reading request data when it arrives).

It is replied with a short Yes, but the question is developed as the email replies go on. One thing that got my attention was specifically this part: server is intelligent enough to avoid deadlock by always reading request data when it arrives. I kept reading :

In short, the result of the response entity transmitted before all the request entity has been read is unpredictable.

By the way, pure tunnelling leads to deadlock: the application can get stuck writing if the client isn't reading the response until it transmits all the request, and all the TCP windows fill up

It's not possible to omit the buffering somewhere: for clients which send a whole request before reading the response, the entire request has to be buffered or stored somewhere, either in the server or in the application, to resolve the deadlock.

Response.Write was getting stuck, it seems.

Though I know forum talk is no official paper (even their own forum talk), I guess this gave me the insight needed to solve my problem.

I also tried to dig in .NET code to double check if that was the root of the problem, but then I ran into native calls and gave up.

So afterwards I changed my code to upload first, then import the data and spitting the results while doing so, and put 2 progress bars: one for uploading and another for processing.

It ran smoothly and worked as expected. No more hangs or slow calls to Response.Write.

If I wanted, I could import the data while uploading the file, but if and only if I started writing the response after I got all request data.

Mystery solved, thanks for everyone that read the question. I'll not accept my own answer yet, I'll wait for 2 or 3 days to check if anyone has a better explanation for this incident.

Upvotes: 1

Related Questions