amateur
amateur

Reputation: 44605

handling large datasets in web api & odata

I have been working with asp.net web api over recent weeks with great success. It has really assisted me with producing an interface for mobile clients to programme against over http.

I reached a point where I need some assistance.

I have a new endpoint which will can a database and could return 100K results. I am using OData to filter the data and return a paginated set of the data.

As this could happen for mutliple requests, I am concerned with performance. Returning 100K records from the database every time is not ideal. So I have some ideas.

First one is to cache the 100K results and let OData do its magic on this every time. I am working with AppFabric distributed cache as its a load balanced environment. However caching such an amount of data in AppFabric could result in memory complications so think I am best avoiding this.

Next option is to forget about the magic of OData and send the filters I use in to the database and return only the required data each time. So in other words hit the db every time.

I could look at using a caching handler like the version outlined in this article to cache in the http cache -> http://byterot.blogspot.ie/2012/06/aspnet-web-api-caching-handler.html The drawback of this is if the data gets update via another system which it may, the cached data is not expired.

Any other tips as to how I may handle this scenario, large amount of data, filtered with odata in conjunction with web api?

Upvotes: 6

Views: 10886

Answers (4)

Aliostad
Aliostad

Reputation: 81660

The caching that I have pointed out in the blog http://byterot.blogspot.ie/2012/06/aspnet-web-api-caching-handler.html applies to HTTP caching also known as output caching. Actually the data itself is not cached on the server but on the client or mid-stream cache servers, so is not suitable for what you have it mind.

Upvotes: 0

Mark Stafford - MSFT
Mark Stafford - MSFT

Reputation: 4336

This is a question that's likely to result in a wide variety of answers. That said, let me put on my pre-MSFT hat and give you my two cents.

A lot of architecture questions are best answered with the consultant's answer, "It depends." The answer depends in your case on a few things specifically. Some developers have a problem with caching layers because there are additional things to think about. An ACID-compliant database buys you a lot of insurance that you have at least a very finite amount of eventual consistency.

If it were me making this decision, I would be considering a few things:

  • How many rows am I returning on a regular basis?
  • Are they the same rows over and over?
  • How big is that in memory? (100k is really not that many rows; you're right about not wanting those 100k rows to hit the disk every time, but it's probably not a problem to keep them all in memory; SQL Server would probably do this for you anyway.)
  • What am I willing to deal with re: eventual consistency? Do I want some other software to deal with it? (What frequently scares people about caches are things like ensuring that invalidation and insertion get done properly and consistently from different applications/different places in the application.)

Given the information you've already provided (tiered architecture, willingness to try a distributed cache) I think you should pursue a caching layer. There are lots of good caches out there. AppFabric worked fine for us before I worked at Microsoft, but I've also dealt with a variety of other caching layers as well.

Upvotes: 3

Ravian
Ravian

Reputation: 64

best way is to a distributed cache, which you are already using. but the cache provider which you are using i.e. AppFabric, has some limitations. by limitations i mean the feature limitations. check out NCache which is a well mature and feature rich third party distributed cache provider.

if you want to understand the differences of NCache and Appfabric, check the youtube link below, this is FYI....

http://www.youtube.com/watch?v=3CPi1QlskrU

Upvotes: 0

user1527329
user1527329

Reputation: 529

Assuming you use Entity Framework it would be the best option to return the IQueryable of EF directly. This way the magic of OData will work directly on your database. $limit and $take will be mapped directly to your SQL query.

Upvotes: 2

Related Questions