Jared Stewart
Jared Stewart

Reputation: 616

Are guaranteed null fields an indication of poor database design?

I am working on a batch processing application that allows users to submit requests for information about particular vehicles. Users can submit a request either using a VIN or a License Plate/State combination. I proposed the following table structure:


VehiclesToBeProcessed

vehicle_id(fk)|user_id(fk)|status|start_time

Vehicles

vehicle_id|VIN|plate|state


My colleague argued that this was poor design, because every record in Vehicles would either have a null VIN field, or null plate and state fields. Instead, they proposed the following:
VehiclesToBeProcessed

vehicle_id(fk)|user_id(fk)|status|start_time

Vehicles

vehicle_id(pk)|field|value

where an entry in Vehicles would either consist of one row for a vin:

1|"vin"|"123

or two rows for a plate/state:

2|"plate"|"abc 123" 2|"state"|"NY"


I thought that the first solution would be much easier to query without having any significant downside. Which design should be preferred? Are guaranteed null fields really an indicator of bad design?

Upvotes: 2

Views: 87

Answers (4)

Neil McGuigan
Neil McGuigan

Reputation: 48246

Nulls are fine. They are particularly useful for Single Table Inheritance, and if your system needs "Draft" entities.

If you use a quality database like Postgres, there is no storage penalty for nulls.

Anyways, if the problem is "we need A OR B, and A and B are pretty darn similar" then the answer is almost always Table Inheritance. If you want to move fast then use Single Table Inheritance. If NULLs make you sad, then use Class Table Inheritance.

--STI:
create table vehicle_identifiers (
  id int primary key,
  type text not null check (type in ( 'VIN', 'STATE_N_PLATE' ) )
  vin null,
  state char(2) null,
  plate text null,

  check ( ( type='VIN' and vin is not null ) or ( type='STATE_N_PLATE' and state is not null and plate is not null ) )
);

--CTI:
create table vehicle_identifiers (
  id int primary key
);

create table vehicle_identifiers_vin (
  id int primary key references vehicle_identifiers(id),
  vin text not null
);

create table vehicle_identifiers_state_n_plate (
  id int primary key references vehicle_identifiers(id),
  state text not null,
  plate text not null
);

Upvotes: 0

Erwin Smout
Erwin Smout

Reputation: 18408

What your colleague proposed is about the ultimate antipattern in database design.

Google for Bill Karwin's "antipatterns" book and for "EAV".

Ask your colleague how he proposes to enforce that "plate" and "state" values always appear in pairs in his database. If he points to the application code, ask him how he proposes to enforce that the database will only ever get updated through his application.

Your solution is a thousand times better than his. Still "better" (from the perspective of relational purity which involves avoiding all nulls) is to give each type of request its own table :

VehicleQueriesByVIN

user_id(fk)|status|start_time|VIN

VehicleQueriesByPlate

user_id(fk)|status|start_time|plate|state

If historical trace is to be kept of the statuses over time for each query, that stuff has to be singled out in its own table.

Upvotes: 2

Ross Bush
Ross Bush

Reputation: 15175

Think of it as multiple ways to identify a vehicle. You vehicle has one or many identities. The local police may identify your vehicle using LPN while the parking authority may use permit badges or active/passive transponders, furthermore the dmv probably relies on vrn numbers.

If you really want to build a flexible way to bind a vehicle to multiple identities I would use an Identity type table so a vehicle can have one or many identities.

VehicleIdentity
VehicleIdentity PK
VehicleID FK
IdentityValue
IdentityType (Type)
StateID??

Vehicle
VehicelID PK

I updated the answer by removing a table that I see would be of no use :)

Upvotes: 0

Ernest Friedman-Hill
Ernest Friedman-Hill

Reputation: 81684

In a word: no. This is a case of misplaced optimization. His schema will actually take up more space on average, due to storing the strings; and of course the more complex code and queries will have worse performance.

Upvotes: 0

Related Questions