Reputation: 3919
I'll try to explain my problem as well as I can.
I have to compare data of same type, saved in a MySql database but containing a varying number of values.
I build my database like this (probably not the best):
--
-- Table structure for table `amount`
--
CREATE TABLE `amount` (
`id` tinyint(1) UNSIGNED NOT NULL,
`value` tinyint(1) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--
-- Dumping data for table `amount`
--
INSERT INTO `amount` (`id`, `value`) VALUES
(34, 1),
(22, 2),
(30, 6),
(21, 7),
(9, 8),
(17, 9),
(10, 10),
(15, 11),
(3, 12),
(4, 13),
(8, 14),
(5, 15),
(16, 16),
(13, 17),
(6, 18),
(20, 19),
(7, 20),
(23, 21),
(18, 22),
(19, 23),
(24, 24),
(14, 25),
(25, 26),
(26, 27),
(28, 28),
(29, 29),
(11, 30),
(27, 31),
(12, 32),
(31, 33),
(32, 35),
(33, 36),
(2, 98),
(1, 99);
-- --------------------------------------------------------
--
-- Table structure for table `mark`
--
CREATE TABLE `mark` (
`id` tinyint(1) UNSIGNED NOT NULL,
`name` varchar(16) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--
-- Dumping data for table `mark`
--
INSERT INTO `mark` (`id`, `name`) VALUES
(1, 'A'),
(2, 'B'),
(3, 'C'),
(4, 'D'),
(5, 'E'),
(6, 'F'),
(7, 'G'),
(8, 'H'),
(9, 'I'),
(10, 'J')
(11, 'K')
(12, 'L')
(13, 'M')
(14, 'N')
(15, 'O');
-- --------------------------------------------------------
--
-- Table structure for table `profile`
--
CREATE TABLE `profile` (
`id` smallint(2) UNSIGNED NOT NULL,
`run` smallint(2) NOT NULL,
`deleted` datetime DEFAULT NULL,
`created` datetime NOT NULL,
`validated` datetime DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
-- --------------------------------------------------------
--
-- Table structure for table `profile_mark`
--
CREATE TABLE `profile_mark` (
`id` int(11) NOT NULL,
`id_profile` smallint(2) UNSIGNED NOT NULL,
`id_mark` tinyint(1) UNSIGNED NOT NULL,
`id_amount` tinyint(1) UNSIGNED NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Each time new data arrive, a profile is created and if necessary a new mark and a new amount.
It means a profile can have up to 48 values. I plan to have at least 20000 profile in database in the future.
My Goal: If I take one profile, I must find all other profiles which have at least 1 common value for X of their marks. (where X is the number of marks minimum which must match)
At the moment, I take all profiles one by one to compare it to the tested one. It takes a some time (I have only about 50 profile at the moment in database) and it's not a good solution for the future of my application.
Another solution I imagine is to cache (or save in database), all profile id for each mark_amount association... but It seems not a good idea :(
I need some advise to optimize this comparison please. (I'm open to other database, cache system than php/mysql etc...)
EDIT1: example of profile matching or not on 8 marks
Upvotes: 2
Views: 79
Reputation: 5570
The is question needs more details but I see a couple of general improvements that'll be useful here : First, I don't see any indexes, please make PRIMARY KEY for every id Example :
CREATE TABLE `mark` (
`id` tinyint(1) UNSIGNED NOT NULL PRIMARY KEY,
...
If it's too late to change the tables , use CREATE INDEX
Second, for consistency sake make REFERENCES to declare foreign key dependency example:
FOREIGN KEY (id_mark) REFERENCES mark(id)
Finally, run EXPLAIN
statement on your query, and see what you can improve according to the result (you can add index for queries that are frequently used to )
EXPLAIN SELECT ...
Upvotes: 0
Reputation: 37108
The query to return all profile_mark.id_profile
that have exactly @matched_marks
marks with at least 1 same amount as the profile with the given @target_profile_id
:
SELECT `match`.id_profile, count(*) as X FROM (
SELECT DISTINCT `all`.id_profile, `all`.id_mark FROM profile_mark as `all`
INNER JOIN profile_mark as `one`
ON `one`.id_mark = `all`.id_mark
AND `one`.id_amount = `all`.id_amount
WHERE `all`.id_profile <> @target_profile_id
AND `one`.id_profile = @target_profile_id
) as `match`
GROUP BY 1
HAVING X = @matched_marks; // can be >= if you need at least X matching marks
As a side note, id_profile smallint(2)
seems insufficient for at least 20000 profiles.
Upvotes: 2