Ensure referential integrity when deleting domain entities referenced by primary key/id

Question

I'm trying to design my web app with ddd practices in mind. This app deals with the storage of containers in storage locations. A container contains a substance. Most likely, users will search for a substance and want to know in which location to find the container. Moreover, they will want to inventorize a storage location, i.e. get all containers of that storage location.

This is why I have identified substance, container and storageLocation as aggregates. I have learned, that other aggregates should not be referenced directly, but by primary key. Now, I am wondering what the best way to ensure referential integrity in my domain layer is (i.e. not having references that point to a nonexistent/wrong container), e.g. when deleting containers since substance and storageLocation have references to containers. Let's assume all references are bidirectional. I am mostly afraid of "forgetting" to add appropriate methods to an entity which might be added later in the project. I am not sure if that even is a "valid" concern when programming.

These are my entities:

@Entity
public class Substance{
    @Id
    @GeneratedValue
    private Long id;

    @ElementCollection
    List ContainerIds;

    public void addContainer(Container c){containerIds.add(c.getId())}
    public void removeContainer(Container c){// removes c.getId() from list}
}

@Entity
public class Container{
    @Id
    @GeneratedValue
    private Long id; //+get

    private Long substanceId; //+ get set
    private Long storageLocationId; //+ get set
}

@Entity
public class StorageLocation{
    @Id
    @GeneratedValue
    private Long id;

    @ElementCollection
    private List containerIds;

    public void addContainer(Container c){containerIds.add(c.getId())}
    public void removeContainer(Container c){// removes c.getId() from list}
}

Now, I'n my controller, I have to get the Substance and StorageLocation entities from the repository, remove the container ID references from them and then remove the container:

public class Acontroller{

    private ContainerRepository containerRepository; // constructor injected
    private SubstanceRepository substanceRepository; // constructor injected
    private storageLocRepository storageLocRepository; // constructor injected

    public void deleteContainer(Container c){
        Substance sub =  substanceRepository.getByID(c.getSubstanceId());
        sub.removeContainer(c);

        //The same for the storageLocation

        containerRepository.removeContainer(c);
    }
}

And everytime I add another entityReference to Container, I will have to expand the controller method.

Is this way of managing the references "by hand" acceptable. If not, how would I go about doing it while retaining the reference by id? Or should I forget about the id and just work with object references?

ps: first SO question, so please be gentle with me and let me know what to change about the question.

Chris Simon · Accepted Answer

Only Model Necessary Associations

Let's assume all references are bidirectional.

I think this is probably the first assumption you need to question. When modelling your domain entities, it's best to think about the operations that they participate in and the invariants you need to enforce during those operations. If bidirectional references aren't required for those operations and invariants, don't maintain them.

e.g. in your case - depending on your domain and invariants, you might be able to get away with uni-directional interfaces - e.g. perhaps substance holds a containerId and container holds a storageLocationId

Chapter 5 "Model Expressed in Software" of Eric Evan's book has an excellent discussion on this topic, including explicit debunking of the usual first-case assumption that references must be bidirectional.

Is "Delete" a Business Operation?

Expanding on @VoiceOfUnreason's answer and Udi Dahan's blog, it is really important to understand what your users mean when they ask to be able to delete something. In your case - a few questions to ask:

Has the container gone out of service?
Will it return to service at some point in the future?
Is it now being used in a different part of the facility?
Is it being deleted because the container identifier has changed (e.g. a barcode has rubbed off and a new one printed and stuck on) - in which case you might be modelling the identity of the entity incorrectly, as the container is the same, but the barcode has been changed - in which case the barcode is not truly the container's identity
What happens to substances in a container when it is 'deleted'? Have they been used? moved to another container?

'Referential Integrity' via Eventual Consistency

Sometimes things that look like invariants, are not really things that absolutely, positively, must at all times be enforced. e.g. in the unlikely case that if you did discover through all of the above questioning that you really do need to delete a container, what would happen if there was a slight delay in processing the ramifications of the delete from the perspectives of the substances?

Could you publish a domain event ContainerDeleted and in the handler for that event, identify all the associated substances and do what needs doing to them? e.g. mark them as 'uncontained' or whatever makes sense in your domain.

This allows you to keep aggregates small by focussing on the things that truly are invariants for that aggregate - Vaughn Vernon's Effective Aggregate Design is great reading for exploring this concept.

Identifying Hidden concepts in the Model

Sometimes through analysis and 'knowledge crunching' you can identify hidden concepts in the model, that when brought to light and modelled explicitly can simplify your model and business processes. e.g. in your case, a few things that might be useful:

Explicitly model a ContainerPlacement:
- This could be a entity within the storageLocation aggregate - the stoageLocation may hold a collection of ContainerPlacement
- ContainerPlacement could just hold a reference to a containerId and perhaps any properties required to enforce the invariants that the storageLocation must maintain - e.g. perhaps it holds a copy of the container volume valueobject to permit enforcing the invariant, "don't put more containers in me than will fit in me" on the storageLocation aggregate, whilst leaving most of the other properties of the container (e.g. colour, in-service date, etc.) as the responsibility of the container aggregate.
What is a substance really? Can multiple containers contain the same substance? i.e. if substance is 'water' can multiple containers contain water? Is there a difference between the water in one container and the water in another?
- Perhaps there is a difference between substance as an entity - maintaining the name of the substance and other properties of it - viscosity, density etc., and substance as a valuobject - representing the volume or quantity of the substance within a container.
- this would simplify the model, as then the container would just have a ValueObject - perhaps called ContainedSubstance - on it defining a substanceId and a volume. If containers can have multiple substances in them, you could model it as a collection of such valueobjects.

Separating Query Operations

Some of your requirements are really query requirements - the domain model does not exist to satisfy query requirements - only to enforce invariants under changes.

You might find that even with the association modelling revealed by the above questions you can satisfy your queries with a relational database persistence of your domain model - but if not, you can also look into maintaining a separate read model to facilitate the queries whilst leaving your domain models purpose built to maintain their invariants.