Starnuto di topo
Starnuto di topo

Reputation: 3569

How to design HTTP API to push massive data?

I need to provide an HTTP API for clients to push massive data, in the shape of a set of records. My first idea was to provide a set of three calls, like:

The first call should be used to initialize some temporary data structure and give the user an identifier, so that subsequent calls can refer to it and data from multiple users don't mess up. The second call should be invoked as many times as needed, until all data is sent to the server. Finally, invoking the last call, the client confirms that all data has been pushed, so the server can process all the temporary data just stored.

In general, it's considered a good practice to conform to REST principles, but this strategy of uploading large data clearly violates the REST principle of being stateless. For this reason, I'm looking for some better alternative way of doing the job. References to well-known patterns would be appreciated!

Upvotes: 0

Views: 1196

Answers (2)

Matt Timmermans
Matt Timmermans

Reputation: 59174

First, note that your current idea has a fatal flaw: If the client is disconnected during PushSomeData, then it has no way to know whether or not the push succeeded, and can't reliably resume the operation. The final solution has to fix that.

With that out of the way...

If you want to be able to resume an interrupted transfer, then there has to be some state somewhere, but:

  • Unless your API is all about storage, the resource you're updating should not also be providing the capabilities for managing the state of uploads in progress. You should move those to a different resource that is all about that; and
  • The client needs to know about the state of its upload operation, but the server does not. You can have a stateless server that stores upload parts on behalf of the client, but remains stateless because it doesn't associate these parts to an "upload in progress" or track the state of the upload process in any way. I'll show you what I mean below.

The most REST-like implementation of this capability would be like this:

  1. You provide two forms of the PushData API. In one form, you just accept the data. In the second form, you instead accept a list of URLs from which parts of data can be retrieved.
  2. Provide clients with a private area in which they can store parts of the data using their own identifiers. You should provide full CRUD capabilities so that clients can manage their upload state however they see fit. Essentially this is just like a remote file system.

So in the normal upload process, the client just uploads the parts to its private space, and then sends the urls for all the parts to the PushData API. The server doesn't need to know anything about the actual state of the client's upload.

By separating the management of multi-part uploads from the specific endpoint you're pushing data to, you allow the same procedure to be used for many target resources without having many implementations.

Note that in (1), you can restrict the URLs you accept in any way your like. Initially, you will probably require that they point into the client's private area following the normal process. The API is very future-proof, however, and it allows you to support different or multiple kinds of staging areas in the future. Maybe you want to let clients upload to Amazon S3 and you can get the data from there. In that case you don't even need to do (2)!

There is also a lot of flexibility in the kind of API you provide for (2). You can make it very specifically an upload-staging API. See Amazon S3 for an example. Or you could provide more a file-system-like view.

Upvotes: 2

jaco0646
jaco0646

Reputation: 17066

The design sounds perfectly reasonable to me, and I think it conforms to the ReSTful principle of Statelessness as well.

Each request from the client to server must contain all of the information necessary to understand the request, and cannot take advantage of any stored context on the server.

That requirement is satisfied by the id returned from the initial push. The id is maintained and reused by the application; so there is no session state stored on the server, only resource state.

Upvotes: 3

Related Questions