Martin
Martin

Reputation: 40593

3-tiers pattern and large amounts of data

Here is my situation: I am trying to follow as hard as I can the 3-tier pattern (i.e. Presentation, Business and Data layer). When I need data from the DB, the Business layer calls the Data layer which returns the information. The Data layer never return a SqlDataReader or DataTable object, but often an enumeration of custom object known by the Data Access Layer. It works pretty well when the Data layer has to return a list with few objects.

I am now facing this problem, my application (the business layer) must process 500000 records. I could simply add another method to my Data layer and return an IEnumerable but this sound very bad to me. I don't want to load half-million records in memory.

My question is, considering the 3-tier model, how should I handle this case? If I had no 3-tiers pattern, I would simply use SqlDataReader in my business classes. Any suggestions?

UPDATE: The data will not be displayed, so this is not a paging issue (the presentation layer is not involved at all here). I simply have to analyze each record and then keep some of them.

Thanks

Upvotes: 5

Views: 809

Answers (9)

Tim
Tim

Reputation: 11

Do the filtering in the database. there is no need to bring over 500000 records that you are going to filter out anyway. Why bring them all to the middle tier just to remove them. Take care of the operation (data) as early as possible using the SQL Engine in the back end (sproc). Most efficient, similar to checking for basic input checks on the presentation layer before sending to IIS.

Upvotes: 1

cbrcoder
cbrcoder

Reputation: 540

If I am understanding this correctly you want to "Analyze" the records and then keep some of them and get rid of rest of them. Well in that case I think it will be best to handle this within the database itself (PL/SQL or T/SQL). Requirements like these should be a top priority and not the architecture. Since you are not displaying just analyzing, it's best to do in the procedure itself.

Upvotes: 0

Peter Stuifzand
Peter Stuifzand

Reputation: 5104

You can build a abstraction on top of the SqlReader class. That way you don't have to pass the SqlReader directly, but you can still process the objects one at a time.

Think Iterators.

Upvotes: 1

bmb
bmb

Reputation: 6248

In some cases, you have to break the 3-tier boundaries. But before you do, you could ask yourself:

  1. When you "analyze each record and then keep some of them," is that really part of the business logic? Or is it a data-access function? It might be the case that this belongs in the data access layer.

  2. If it is part of the business logic, do you need all 500000 records in order to make a decision about whether to "keep" any individual record? It might be that the business layer should be processing one record at a time. Making 500000 consecutive database calls is not pretty, but if that is what the app should be doing from a conceptual standpoint, there are ways to mitigate that.

I don't recommend doing anything dumb just to keep the 3 tiers separate. But sometimes, when you think you have to cross the line, it's because there is something in the design that needs a second look.

--
bmb

Upvotes: 1

David Robbins
David Robbins

Reputation: 10046

There is no shame in doing whatever analysis you need to at the database level. If you can slice and dice what you need with stored procedure or make the necessary correlations with stored procedures and use an application for more complex operations you should be fine.

The question is, does the user expect to push a button and process all 500K records and see a result? If so, are they willing to sit and watch a spinning gif or will it be satisfactory to receive some type of notification when the process is complete? If processing the 500K is of the utmost importance, does your data model need alteration to support this process? There are processing methods such as Hadoop and message queues that are geared for this high volume, but do you need to go to this extent? You might able to set the expectations of your users before pulling you hair out over performance.

Upvotes: 0

Aaron Digulla
Aaron Digulla

Reputation: 328724

A piece of paper can never trump reality. If your specific problem asks to break the 3-tier paradigm, do it.

Upvotes: 1

Richard Dorman
Richard Dorman

Reputation: 24120

This is not an uncommon problem and occurs frequently in situations where you need to consolidate large amounts of data and present summaries to the user (reports are a typical example). Your solution should be designed with these considerations in mind. It does not make sense to ignore the efficiencies offered by sql readers (or similar tools) when strict coherence to some particular architectural model makes your application inefficient. It is often possible to overcome some of these problems by adapting an architectural model to your needs. Generic architectural models are rarely applicable out of the box. They are guidelines that should be applied to your particular needs.

Upvotes: 0

duffymo
duffymo

Reputation: 308938

Yes, your instinct is correct.

I'm betting that your UI client does not want to look at half a million records at once. Google doesn't return every hit in a single page; you won't, either.

You have a choice as to where and when your application processes those half a million records. You can chunk them into smaller units of work; you can process them asynchronously; you can write a stored procedure and process them in the database without bringing them all over to the middle tier.

The MVC pattern is wonderful, but it's not holy writ. Make the choices that work for your app.

Upvotes: 1

Matt Grande
Matt Grande

Reputation: 12157

I assume you're not displaying 500,000 records to the front end at once? You're probably doing some pagination, right? So, only return one page worth of data from the database at one time.

Upvotes: 2

Related Questions