kmeans clustering using PL/R in postgresql

Question

I am trying to use kmeans function in PL/R. I have created following table

CREATE TABLE EMP (NAME1 TEXT, AGE SMALLINT, SALARY NUMERIC );

INSERT INTO EMP VALUES 
('Joe', 41, 55000),
('Jill', 27, 25000),
('Jack', 31, 45000),
('Joker', 65, 20000),
('Joy', 22, 31000),
('Jane', 72, 35000),
('Jackson', 42, 65000),
('Jessica', 23, 37000);

My clustering function is given below

CREATE OR REPLACE FUNCTION CLUS(sal NUMERIC[])
RETURNS INTEGER[] AS
$BODY$

a <- NA
a = kmeans(sal, 3)$cluster
return(a)

$BODY$
LANGUAGE 'plr' ;

When I am executing the following query

SELECT * , CLUS(ARRAY (SELECT SALARY FROM EMP ORDER BY SALARY)) AS CLUSTER_ID from emp order by salary;

My output is

name1   age salary  cluster_id
Joker   65  20000   {3,3,2,2,2,2,1,1}
Jill    27  25000   {2,2,2,2,2,3,3,1}
Joy     22  31000   {3,3,2,2,2,2,1,1}
Jane    72  35000   {1,1,1,1,1,2,2,3}
Jessica 23  37000   {1,1,2,2,2,2,3,3}
Jack    31  45000   {2,2,3,3,3,3,1,1}
Joe     41  55000   {1,1,3,3,3,3,2,2}
Jackson 42  65000   {2,2,3,3,3,3,1,1}

Problems I am facing are, every time a row is evaluated clusters are changing and I want the following output …

name1   age salary  cluster_id
Joker   65  20000   3
Jill    27  25000   3
Joy     22  31000   2
Jane    72  35000   2
Jessica 23  37000   2
Jack    31  45000   2
Joe     41  55000   1
Jackson 42  65000   1

please let me know if it can be done in a better and simple way

kmeans clustering using PL/R in postgresql

Answers (1)

Related Questions