Reputation: 419
source | target
apple | dog
dog | cat
door | cat
dog | apple
cat | dog -----step 1.
Using SQL code:
SELECT GREATEST(source,target),LEAST(source,target),COUNT(*) FROM my_table GROUP BY GREATEST(source,target),LEAST(source,target);
will be
apple dog 2
dog cat 2
door cat 1 ------step2.
so I want to count the probability and update into name call "prob" column
like
source | target | prob
apple | dog | 2/(2+2+1)
dog | cat | 2/(2+2+1)
door | cat | 1/(2+2+1)
dog | apple| 2/(2+2+1)
cat | dog | 2/(2+2+1) -------step3.
How can I do from step 1 to step3.
Upvotes: 0
Views: 55
Reputation: 33945
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(source VARCHAR(12) NOT NULL,target VARCHAR(12) NOT NULL
,PRIMARY KEY(source,target)
);
INSERT INTO my_table VALUES
('apple','dog'),
('dog','cat'),
('door','cat'),
('dog','apple'),
('cat','dog');
SELECT x.*
, y.total/(SELECT COUNT(*) FROM my_table) prob
FROM my_table x
JOIN
( SELECT GREATEST(source,target) g,LEAST(source,target) l,COUNT(*) total FROM my_table GROUP BY g,l ) y
ON (y.g = x.source AND y.l = x.target)
OR (y.g = x.target AND y.l = x.source);
+--------+--------+--------+
| source | target | prob |
+--------+--------+--------+
| apple | dog | 0.4000 |
| dog | apple | 0.4000 |
| cat | dog | 0.4000 |
| dog | cat | 0.4000 |
| door | cat | 0.2000 |
+--------+--------+--------+
...or something like that
Upvotes: 1