w0051977
w0051977

Reputation: 15817

Data Warehouse type solution?

I am developing a system that deletes information from other systems. For example, there is a Customers database and customers can be deleted if their most recent order is six years old or greater.

I can think of two approaches to this:

  1. Create classes for each of the systems e.g. Customer (for the customer database), Order (for the Order database) etc. Have a function called Delete in each class for Polymorphism i.e. the client can call Delete in each class, which will delete the necessary records.
  2. Copy all the information required to make a decision about an entity e.g. Order into a copy database using SSIS and produce one big query of everything that should be deleted. This is a Data Warehouse type approach.

I cannot decide which option is better.

Upvotes: 1

Views: 114

Answers (3)

Steven Doggart
Steven Doggart

Reputation: 43743

I would take issue with your first approach. I'm not sure that it makes sense that all of the separate business classes would somehow share the same interface for a Delete method. The parameters needed to delete a customer may not be the same as the parameters needed to delete an order. It seems odd to force them to share the same interface. Doing it that way may unnecessarily restrict the flexibility of your code in the future. Also, it seems odd that the business logic for deleting a customer would involve blindly looping through other objects, in no particular order, and calling a Delete method on all of them. I'd think you'd want to have more control over the order in which they get deleted and how it is handled when one of the delete methods fails.

I would suggest first thinking of it at a higher level. What business methods do you actually need? Based on the scenario you described, I can see two business methods:

  • Get a list of all the customers with no orders in the past 6 years
  • Delete a customer (along with all of its orders)

Since both of those methods are related to customers, it would make sense to group them together into a single customer business class, of some sort. For instance:

Public Interface ICustomerBusiness
    Function GetStaleCustomers(timeSinceLastOrder As TimeSpan) As IList(Of CustomerDto)
    Sub DeleteCustomer(customerId As Integer)
End Interface

Once you've encapsulated the business logic like that, now it doesn't matter how the data access layer is implemented. The client makes calls to those simple business layer methods and doesn't have to care how the logic works behind the scenes. With the business logic encapsulated in its own layer, you will be free to rewrite it in different ways without having to rewrite any of your client code.

So what would the logic inside that business class look like? It could be either a single call to a data access method that does all the work via one massive SQL command, or it could make many calls to do each step separately. That's really up to you. Obviously the former will be more efficient, but the latter will be more flexible. It will all depend on your needs. Here's a simple example of what each might look like:

Public Sub DeleteCustomer(id As Integer)
    _customerDataAccess.DeleteCustomerAndOrders(id)
End Sub

' or...

Public Sub DeleteCustomer(id As Integer)
    For Each i As OrderDto In _orderBusiness.GetOrdersByCustomer(id)
        _orderBusiness.DeleteOrder(i.Id)
    Next
    _customerDataAccess.DeleteCustomer(id)
End Sub

The second option would be more flexible for multiple reasons. For instance:

  • You would have finer control over what is happening and when. That would allow you to provide detailed status updates, if necessary, during the process. It would also allow you to provide detailed trace logging and more precise error messages when something fails.
  • The business logic to delete orders would be broken out into a separate reusable business class. If you needed to delete just an order, from somewhere elsewhere in the code, you would be able to do so via common code.

Upvotes: 2

STW
STW

Reputation: 46394

A third approach is to use a messaging system with multiple agents. This approach is ideal for very complex scenarios.

Here's a scenario:

The user runs a Command to delete an Entity (Order, Customer, etc). The tool the user is working with creates a message in the work queue that represents the user's intent (e.g. "Delete Customer 123")

The message is handled by one or more agents. Each agent is specific to a part of the larger operation, and listens only for relevant messages. All agents work within a single distributed transaction. This means each agent has a very narrow, specific scope, but any agent can reject the overall operation. If an agent needs to perform other sub-tasks it can queue additional messages for those operations (such as deleting each order belonging to the customer).

This approach scales very well, especially for very complex interactions. It avoids any one system having to know about all others. Each agent knows what messages to handle, and handles a very specific task related to that message.

It's more setup initially, but is very extensible (you can add new agents, messages, etc without impacting the existing ones).

If you decide to use this approach take a look at MassTransit for a framework (there are others as well). If you're working in .NET it's a very nice system that's powerful, but approachable. It's Sagas are especially nice for coordinating complex interactions between multiple agents.

Upvotes: 1

user2366842
user2366842

Reputation: 1216

Both seem to have their benefits and drawbacks, I'd personally go the second route, especially if you're dealing with very large amounts of records, as to not tie up the database with continuous hits to delete records.

Upvotes: 1

Related Questions