Hanan Ismail
Hanan Ismail

Reputation: 25

how to eliminate duplicates from my function in postgresql?

I have this vehicle_data table: CREATE TABLE public.vehicle_data

CREATE TABLE vehicle_data
(
  model_name text NOT NULL,
  record_date text NOT NULL,
  inv_quantity integer,
  CONSTRAINT vehicle_data_pkey PRIMARY KEY (model_name, record_date)
)

My table looks like:

model_name  record_date
  car1         5-2015
  car1         1-2016
  car1         2-2015
  car2         2-2017
  car3         8-2016

When I run my function to search for any car, I would like to obtain as a result that orders the entries by month and then by year, so far car1, it should lok like this:

model_name  record_date
  car1         2-2015
  car1         5-2015
  car1         1-2016

Since my record_date is TEXT, I thought that in my function I could split the TEXT array using split_part(record_date,'-',2) to get the year value, store all the unique values in an array, then run my select query for each year.

CREATE OR REPLACE FUNCTION getdata(model text)
   RETURNS TABLE(a text, b text) AS
$BODY$
DECLARE  i int;
     list TEXT[]:= ARRAY(SELECT DISTINCT split_part(record_date,'-',2) as xyz
             from vehicle_data 
             order by xyz);
BEGIN
i:=0;
WHILE i < (select cardinality(list)-1) LOOP 
RETURN QUERY
    select model_name, record_date
    from vehicle_data
    where model_name LIKE model AND split_part(record_date,'-',2) LIKE list[i]
    order by length(record_date), record_date ASC;
i:=i+1;
END LOOP;
RETURN;
END;
$BODY$
  LANGUAGE plpgsql;

Although the function does work, it duplicates the results 68 times, rather than stopping.

Upvotes: 1

Views: 76

Answers (1)

user330315
user330315

Reputation:

To address your immediate question: to iterate over an array use a FOREACH loop, not a WHILE loop. So you should change your function to something like this:

CREATE OR REPLACE FUNCTION getdata(p_model text)
   RETURNS TABLE(a text, b text) AS
$BODY$
DECLARE  
  l_year text;
  l_year_list TEXT[]:= ARRAY(SELECT DISTINCT split_part(record_date,'-',2) as xyz
                             from vehicle_data);
BEGIN
  foreach l_year in array l_year_list loop
    RETURN QUERY
      select model_name, record_date
      from vehicle_data
      where model_name LIKE p_model 
        AND split_part(record_date,'-',2) = l_year
      order by length(record_date), record_date ASC;
  END LOOP;
END;
$BODY$
  LANGUAGE plpgsql;

As no wildcards are involved, I changed the LIKE to an =. I also applied a different naming pattern to the parameters and variables.

But Postgres has really powerful array features in SQL, so the above can be rewritten to a single query without a loop:

CREATE OR REPLACE FUNCTION getdata(p_model text)
   RETURNS TABLE(a text, b text) AS
$BODY$
DECLARE  
  l_year_list TEXT[]:= ARRAY(SELECT DISTINCT split_part(record_date,'-',2) as xyz
                             from vehicle_data);
BEGIN
  RETURN QUERY
    select model_name, record_date
    from vehicle_data
    where model_name LIKE p_model 
      AND split_part(record_date,'-',2) = ANY(l_year_list)
    order by length(record_date), record_date ASC;
END;
$BODY$
  LANGUAGE plpgsql;

This takes all years from the table and returns those rows where the year in the row of the inner quer (only one in case of the second variant) equals to at least one of the values in the array, which is true for all rows, because the values in the array are the all existing years in that column. So at least one of the values in the array will match the one in the row currently being looked at when the query is processed, which in turn means the whole condition isn't really necessary at all.

You can imagine that your function is (logically) being processed like this:

  1. get all years into an array, so the array contains {2015, 2016, 2017}
  2. find all rows where the model name matches. For your example, this leaves us with

    model_name record_date car1 5-2015 car1 1-2016 car1 2-2015

  3. Go through the above rows, and see if the year part of the record_date matches any of the dates in the array. That condition will be always true, as the array contains all possible values for that column. So the condition doesn't remove anything from the result.

Which in turn means, that your query is equivalent to:

select vd1.model_name, vd1.record_date
from vehicle_data vd1
where vd1.model_name LIKE = 'car1'
order by split_part(record_date, '-', 2)::int, 
         split_part(record_date, '-', 1)::int;

Upvotes: 2

Related Questions