Justin La France
Justin La France

Reputation: 799

MySQL efficient data to junction table query

The problem:

I want to move the links of the categories from the table companies_1 into the company_categories table. The company_id in the company_categories table need to be equal to the id of the companies_2 table. The records of the companies_1 and the companies_2 table are linked by the "name"-column.

The information:

There is a table with companies like:

----------------------------------------
| companies_1                          |
----------------------------------------
| id   |  category_id   | name         |
----------------------------------------
| 1    |  1             | example-1    |
| 2    |  2             | example-1    |
| 3    |  1             | example-2    |
| 4    |  2             | example-2    |
| 5    |  3             | example-2    |
| 6    |  1             | example-3    |
----------------------------------------

A table with the DISTINCT company names:

-------------------------
| companies_2           |
-------------------------
| id   |   name         |
-------------------------
| 1    |   example-1    |
| 2    |   example-2    |
| 3    |   example-3    |
-------------------------

A categories table, like:

-------------------------
| categories            |
-------------------------
| id   |  name          |
-------------------------

And a junction table, like:

---------------------------------
| company_categories            |
---------------------------------
| company_id   |  category_id   |
---------------------------------

The current code:

This code works, but is far from efficient.

DELIMITER $$
 DROP PROCEDURE IF EXISTS fill_junc_table$$
 CREATE PROCEDURE fill_junc_table()
 BEGIN
 DECLARE r  INT;
 DECLARE i  INT;
 DECLARE i2  INT;
 DECLARE loop_length  INT;
 DECLARE company_old_len  INT;
 DECLARE _href  VARCHAR(255);
 DECLARE cat_id  INT;
 DECLARE comp_id  INT;

 SET r = 0;
 SET i = 0;
 SET company_old_len = 0;
 SELECT COUNT(*) INTO loop_length FROM companies;

 WHILE i  < loop_length DO
  SELECT href INTO _href FROM company_old LIMIT i,1;
  SELECT id INTO comp_id FROM companies WHERE site_href=_href;
  SELECT COUNT(*) INTO company_old_len FROM company_old WHERE href=_href;
  SET i2 = 0;
  WHILE i2  < company_old_len DO
   SELECT category_id INTO cat_id FROM company_old WHERE href=_href LIMIT i2,1;
   INSERT INTO company_categories (company_id, category_id) VALUES (comp_id, cat_id);
   SET  r = r + 1;
   SET  i2 = i2 + 1;
  END WHILE;
  SET  i = i + 1;
 END WHILE;

 SELECT r;
 END$$
DELIMITER ;

CALL fill_junc_table();

Edit (new idea):

I am going to test another way to solve this problem by fully copying the companies_1 table with the following columns (company_id empty on copy):

---------------------------------------------
| company_id   | category_id  |  name       |
---------------------------------------------

Then, I will loop through the companies_2 table to fill the correct company_id related to the name-column.

I hope you can give your thoughts about this. When I finish my test I will leave the result over here for others.

Upvotes: 3

Views: 179

Answers (2)

Nick
Nick

Reputation: 147166

Why not just update companies_1?

ALTER TABLE companies_1 ADD (company_id INT)
UPDATE companies_1 SET company_id = (SELECT id FROM companies_2 WHERE name=companies_1.name)
ALTER TABLE companies_1 DROP name, RENAME TO company_categories
SELECT * FROM `company_categories` 

Output

id  category_id company_id  
1   1           1
2   2           1
3   1           2
4   2           2
5   3           2
6   1           3

Upvotes: 2

KaeL
KaeL

Reputation: 3659

To clarify, I don't see any PIVOT transformation in the company_categories. What I see is you want a JUNCTION TABLE because it seems that companies and categories tables have many-to-many relationship.

In your case, you have company which has multiple categories. And you also have categories assigned to multiple companies.

Now base from your requirement:

I want to move the links of the categories from the table companies_1 into the company_categories table. The company_id in the company_categories table need to be equal to the id of the companies_2 table. The records of the companies_1 and the companies_2 table are linked by the "name"-column.

I arrived with this query:

INSERT INTO company_categories (company_id, category_id)
SELECT C2.id
    , C1.category_id
    FROM companies_1 C1
    INNER JOIN companies_2 C2 ON C2.name = C1.name

Let me know if this works. The nested loops that you created will really take a while.

As @DanielE pointed out, this query will work in the assumption that company_categories is empty. We will need to use UPDATE otherwise.

Upvotes: 2

Related Questions