ptyskju
ptyskju

Reputation: 315

How to store different types of address data in db?


I have to create a database combined with 4 types of xls files, for example A, B, C and D. Every year new file is created, starting from 2004. A have 7 sheets with 800-1000 rows, B - D have one sheet with max 200 rows.

Everyone knows that people are lazy, so in excel files, address data are stored differently in each sheet. One of them, from 2008, have address data stored separately, but every other sheets have this data combined into one column.

Sooo, here is a question - how should I design a datatable? Something like this?

+---------+----------+----------+-------------+--------------------------------+
| Street  | House Nr |   City   | Postal Code |        Combined Address        |
+---------+----------+----------+-------------+--------------------------------+
| Street1 | 20       | Somwhere | 00-000      | null                           |
| Street2 | 98       | Elswhere | 99-999      | null                           |
| null    | null     | null     | null        | Somwhere 00-000, street3 29    |
| null    | null     | null     | null        | st. Street2 65 12-345 Elswhere |
+---------+----------+----------+-------------+--------------------------------+

There will be a lot of nulls, so maybe best solution would be 2 different tables?

Most important thing is that users will search by using this data, and in the future add data into that database without excel files.

Upvotes: 1

Views: 100

Answers (1)

MyBrainHurts
MyBrainHurts

Reputation: 2630

There are at least two different angles of view here: Normalization and efficiency, leading to different results.

Normalization

If this is the most important criterion you would make even three tables. Obviously Combined Address needs a place of it's own, but also Postal Code and City have to be stored into another table, because there is a dependency between them. Just one of the two, most probably Postal Code will stay. (Yes, there even is sth. about Street and Postal Code too, but I'm clearly not going to be pedantic.)

Efficiency

Normalization as an end in itself doesn't necessarily make the best result. If you permit yourself to be a bit sloppy on that and leave it the way it is in the model you posted, things might become easier in coding. You could use a trigger to make sure Combined Address is never null or use a (materialized) view that pretends it is and just search in Combined Address for the time being.
Imagine the effort if you use different tables and there is a need for referencing these addresses in other tables (Which table to use when? How to provide a unique id? Clearly a problem.).

So, decide on what's more important.

If I'm not mistaken we are taking about some 2000 rows or some 8000 rows if it is '7 sheets with 800-1000 rows each' actually. Even if the latter applies this isn't a number that makes data correction impracticable. If the number of different input pattern in the combined column is low, you might be able to do this (partly) automatically and just have some-one prove reading.

So you might want to think about a future redesign as well and choose what's more convenient in this case.

Upvotes: 2

Related Questions