Tony the Pony
Tony the Pony

Reputation: 41347

Best way to model Customer <--> Address

Every Customer has a physical address and an optional mailing address. What is your preferred way to model this?

Option 1. Customer has foreign key to Address

   Customer   (id, phys_address_id, mail_address_id)
   Address    (id, street, city, etc.)

Option 2. Customer has one-to-many relationship to Address, which contains a field to describe the address type

   Customer   (id)
   Address    (id, customer_id, address_type, street, city, etc.)

Option 3. Address information is de-normalized and stored in Customer

   Customer   (id, phys_street, phys_city, etc. mail_street, mail_city, etc.)

One of my overriding goals is to simplify the object-relational mappings, so I'm leaning towards the first approach. What are your thoughts?

Upvotes: 24

Views: 17192

Answers (12)

U4EA
U4EA

Reputation: 912

Good thread. I have spent a while contemplating the most suitable schema and I have concluded that quentin-starin's solution is the best except I have added start_date and end_date fields to what would be his PersonAddress table. I have also decided to add notes, active and deleted.

deleted is for soft delete functionality as I think I do not want to lose trace of previous addresses simply by deleting the record from the junction table. I think that is quite wise and something others may want to consider. If not done this way, it could be left to revision of paper or electronic documents to try to trace address information (something best avoided).

notes I think of being something of a requirement but that might just be preference. I've spent time in backfill exercises verifying addresses in databases and some addresses can be very vague (such as rural addresses) that I think it is very useful to at least allow notes about that address to be held in the record address.

One thing i would like to hear opinions on is the unique indexing of the address table (again, referring to the table of the same name in quentin-starin's example. Do you think it should be unique index should be enforced (as a compound index presumably across all not-null/required fields)? This would seem sensible but it might still be hard to stop duplicate data regardless as postal/zip codes are not always unique to a single property. Even if the country, province and city fields are populated from reference data (which they are in my model), spelling differences in the address lines may not match up. The only way to best avoid this might be to run one or a number of DB queries from the incoming form fields to see if a possible duplicate has been found. Another safety measure would be give the user the option of selecting from address in the database already linked to that person and use that to auto-populate. I think this might be a case where you can only be sensible and take precautions to stop duplication but just accept it can (and probably will) happen sooner or later.

The other very important aspect of this for me is future editing of the address table records. Lets say you have 2 people both listed at: -

11 Whatever Street Whatever City Z1P C0D3

Should it not be considered dangerous to allow the same address table record to be assigned to different entities (person, company)? Then let's say the user realises one of these people lives at 111 Whatever Street and there is a typo. If you change that address, it will change it for both of the entities. I would like to avoid that. My suggestion would be to have the model in the MVC (in my case, PHP Yii2) look for existing address records when a new address is being created known to be related to that customer (SELECT * FROM address INNER JOIN personaddress ON personaddress.address_id = address.id WHERE personaddress.person_id = {current person being edited ID}) and provide the user the option of using that record instead (as was essentially suggested above).

I feel linking the same address to multiple different entities is just asking for trouble as it might be a case of refusing later editing of the address record (impractical) or risking that the future editing of the record may corrupt data related to other entities outside of the one who's address record is being edited.

I would love to hear people's thoughts.

Upvotes: 1

max
max

Reputation: 29983

In most code I write nowadays every customer has one and only one physical location. This is the legal entity beeing our business partner. Therefore I put street, city etc in the customer object/table. Often this is the possible simplest thing that works and it works.

When an additional mailing address is needed, I put it in a separate object/table to not clutter the customer object to much.

Earlier in my career I normalized like mad having an order referencing a customer which references a shipping address. This made things "clean" but slow and inelegant to use. Nowadays I use an order object which just contains all the address information. I actually consider this more natural since a customer might change his (default?) address, but the address of a shipment send in 2007 should always stay the same - even if the customer moves in 2008.

We currently implement the VerySimpleAddressProtocol in out project to standardize the fields used.

Upvotes: 2

quentin-starin
quentin-starin

Reputation: 26628

We are moving forward with a model like this:

Person (id, given_name, family_name, title, suffix, birth_date)
Address (id, culture_id, line1, line2, city, state, zipCode, province, postalCode)
AddressType (id, descriptiveName)
PersonAddress (person_id, address_id, addressType_id, activeDates)

Most may consider this excessive. However, an undeniable common theme amongst the apps we develop is that they will have some of these fundamental entities - People, Organizations, Addresses, Phone Numbers, etc.. - and they all want to combine them in different ways. So, we're building in some generalization up-front that we are 100% certain we have use cases for.

The Address table will follow a table-per-hierarchy inheritance scheme to differentiate addresses based on culture; so a United States address will have a state and zip field, but Canadian addresses will have a province and postal code.

We use a separate connecting table to "give" a person an address. This keeps our other entities - Person & Address - free from ties to other entities when our experience is this tends to complicate matters down the road. It also makes it far simpler to connect Address entities to many other types of entities (People, Organizations, etc.) and with different contextual information associated with the link (like activeDates in my example).

Upvotes: 7

Tobias Hertkorn
Tobias Hertkorn

Reputation: 2750

When answering those kinds of questions I like to use the classifications of DDD. If it's a Entity it should have a separate ID, if it's a value object it should not.

Upvotes: 3

Greg Beech
Greg Beech

Reputation: 136587

One important fact you may need to consider (depending on your problem domain) is that people change addresses, and may want to let you know in advance of their address change; this is certainly true for utility companies, telcos, etc.

In this case you need to have a way to store multiple addresses for the customer with validity dates, so that the address can be set up in advance and automatically switch at the correct point. If this is a requirement, then a variation on (2) is the only sensible way to model it, e.g.

Customer (id, ...)
Address (id, customer_id, address_type, valid_from, valid_to)

On the other hand, if you don't need to cater for this (and you're sure you won't in the future) then probably (1) is simpler to manage because it's much easier to maintain data integrity as there's no issues with ensuring only one address of the same type exists, and the joins become simpler as they're only on one field.

So either (1) or (2) are fine depending on whether you need house-moves, but I'd steer clear of (3) because you're then repeating the definition of what an address is in the table, and you'll have to add multiple columns if you change what an address looks like. It's possibly slightly more performant, but to be honest when you're dealing with properly indexed joins in a relational database there isn't a lot to be gained, and it's likely to be slower in some scenarios where you don't need the address as the record size for a customer will be larger.

Upvotes: 7

Thomas Müller
Thomas Müller

Reputation: 15625

I would go with option 1. If you want to, you could even modify it a little bit to keep an address history:

Customer   (id, phys_address_id, mail_address_id)
Address    (id, customer_id, start_dt, end_dt, street, city, etc.)

If the address changes, just end date the current address and add a new record in the Address table. The phys_address_id and mail_address_id always point to the current address.

That way you can keep a history of addresses, you could have multiple mailing addresses stored in the database (with the default in mail_address_id), and if the physical address and mailing address are identical you'll just point phys_address_id and mail_address_id at the same record.

Upvotes: 1

cdonner
cdonner

Reputation: 37658

Option 3 is too restrictive, and option 1 cannot be extended to allow for other address types without changing the schema. Option 2 is clearly the most flexible and therefore the best choice.

Upvotes: 2

Markus
Markus

Reputation: 1822

Like in many cases: It depends.

If your customers deal with multiple addresses then a to-many relationship would be appropriate. You could introduce a flag on address that signals if an address is for shipment or bill, etc. Or you store the different address types in different tables and have multiple to-one relationships on a customer.

In cases where you only need to know one address of a customer why would you model that to-many? A to-one relationship would satisfy your needs here.

Important: Denormalize only if you encounter performance issues.

Upvotes: 1

krosenvold
krosenvold

Reputation: 77121

I'd prefer #1. Good normalization and communicates intent clearly. This model also allows the same address object (row) to be used for both addresses, something I have found to be quite valuable. It's far too easy to get lost in duplicating this information too much.

Upvotes: 3

Karl
Karl

Reputation: 3372

I tend towards first approach for all the usual reasons of normalisation. This approach also makes it easier to perform data cleansing on mailing details.

If you are possibly going to allow multiple addresses (mail, residential, etc) or wish to be able to use effective dates, consider this approach

   Customer   (id, phys_address_id)
   Cust_address_type (cust_id, mail_address_id, address_type, start_date, end_date)
   Address    (id, street, city, etc.)

Upvotes: 11

James Gregory
James Gregory

Reputation: 14223

I'd go for the first option. In these situations I'm very weary of YAGNI (you aren't going to need it). I can't count the number of times I've looked at schemas that've had one-to-many tables "just incase" that are many years old. If you only need two, just use the first option; if the requirement changes in the future, change it then.

Upvotes: 1

Erling Thorkildsen
Erling Thorkildsen

Reputation: 111

The second option would probably be the way I would go. And on the off-chance it would let users add additional address' (If you wanted to let them do that), that they could switch between at will for shipping and such.

Upvotes: 4

Related Questions