Reputation: 1508
I have one table. How can I normalize this.
Upvotes: 0
Views: 1543
Reputation: 14399
Even before we start, I'll mention each step's rules first because I don't know the dependencies in your table. Stating the rules and my assumptions about your data will clarify how I arrived at the final solution.
That way, for the least, you will take away the understanding of how normalization works and can go ahead with doing it by yourself on the data that only you understand the best.
Now, that the basic designing rule is in place, it is easy to see that your single table actually has four separate entities fused together. They are:
1. Employee - who has id and name
2. Role - which can be Engineer, Tech Support etc
3. Department - which can be Software, Hardware, Department (could use better word though), Voice etc.
4. Salary - which will have EmpId, DateOfChange and Amount. This is because employees have different salaries and also the salaries of same employee changes over time.
So, we are going to break this table to four tables as mentioned above. Out of the four tables, Employee table has id as the primary key, Role and Department need synthetic (maybe even auto-incrementing) keys and Salary will have {EmpId, DataOfChange} together as the primary key. This will look like:
Table Name Columns
Employee Id, Name, RoleId, DeptId
Role RoleId, RoleType
Department DeptId, DeptName
Salary EmpId, DateOfChange, Amount
All the above table can have more items. I'm trying to hit the design that has minimalistic difference from that table you have already given. Like Salary table can also have a field like ReasonOfSalaryChange which could have values like NewHiring, Promotion etc... But we hold the changes to minimal.
Now we can move to actual normalization of your data.
I never had to use anything above 3NF. So I'll limit the explanation below till the third normal form.
1NF: simply states that all columns must have atomic values. If a column needs more than one value, create another table. Your new tables pass that test. All values in your table's columns are atomic.
2NF: requires 1NF qualification and that any non-key field should be dependent on the entire primary key. All your non-key fields (Name in Employee, RoleType in Role, DeptName in Department, Amount in Salary) depend on the respective table's primary key (Id, RoleId, DeptId and {EmpId, DateOfChange}). So the tables qualify 2NF fine.
3NF: requires 2NF qualification and that no non-key field should depend upon any other non-key field. This means that there should be no dependencies between table's columns except on the primary keys. Role, Department and Salary tables are default 3NF qualified as they have only one non-key column and it's dependent on PK. Employee table, you can verify yourself, has no non-key element that is dependent on any other non-key element. So the tables are perfectly 3NF qualified the way they currently are.
All that's left now is to define that RoleId, DeptId and EmpId are foreign keys from the Role, Department and Employee table respectively.
Upvotes: -1
Reputation: 1320
Good question given your data set. Keep in mind the whole point of normalization is to reduce duplication. 3NF is often the best way to go. But given my experience I've found very little benefit to pulling a repeated value out into a different table if it will be the only value in that table. Take for example your most duplicated column, emp_type. If you were to normalize it into a separate table , it would look like this:
Emp_Type_Id | Emp_type
------------------------
1 | Manager
2 | Engineer
3 | Tech Support
And your current table would look like:
Emp_ID | Emp_Name | Salary | Emp_Type_Id | Emp_Skill
----------------------------------------------------
1 | raj | 90000 | 1 | Department
2 | ravi | 50000 | 2 | Software
3 | shyam | 70000 | 2 | Hardware
.
.
.
This is technically more normalized than before as the emp_type value is no longer repeated in your database. But it's significantly more hassle to deal with a separate table and a relationship for a single value. If emp_type included other information as well such as Valid_Salary_Range, Department_Location, etc. then it would absolutely be best to normalize it out into a separate table. But if it's only a single value you are normalizing, what's the difference between storing the value as opposed to storing an ID pointed to that value? In my opinion it's a pointless extra step.
Long story short, I would not normalize your table at all, it's already normalized to a sufficient level.
Upvotes: 3