nomadoda
nomadoda

Reputation: 4942

Left join returns duplicate rows

I am just learning SQL and I'm really struggling to understand why my left join is returning duplicate rows. This is the query I'm using:

SELECT "id", "title"
FROM "posts"
LEFT JOIN "comments" "comment"
ON "comment"."post_id"="id" AND ("comment"."status" = 'hidden')

It returns 4 rows, but should only return 3. Two of the returned rows contain are duplicate (same values). I can fix this by using the DISTINCT prefix on "id".

SELECT DISTINCT "id", "title"
FROM "posts"
LEFT JOIN "comments" "comment"
ON "comment"."post_id"="id" AND ("comment"."status" = 'hidden')

The query returns 3 rows and I get desired result. But I'm still wondering why in the world I would get a duplicate row from the first query in the first place? I'm trying to write an aggregation query and this seems to be the issue I'm having.

I'm using PostgreSQL.

More specific: (as created by my ORM)

Shift DDL

CREATE TABLE shift (
    id uuid DEFAULT uuid_generate_v4() PRIMARY KEY,
    "gigId" uuid REFERENCES gig(id) ON DELETE CASCADE,
    "categoryId" uuid REFERENCES category(id),
    notes text,
    "createdAt" timestamp without time zone NOT NULL DEFAULT now(),
    "updatedAt" timestamp without time zone NOT NULL DEFAULT now(),
    "salaryFixed" numeric,
    "salaryHourly" numeric,
    "salaryCurrency" character varying(3) DEFAULT 'SEK'::character varying,
    "staffingMethod" character varying(255) NOT NULL DEFAULT 'auto'::character varying,
    "staffingIspublished" boolean NOT NULL DEFAULT false,
    "staffingActivateon" timestamp with time zone,
    "staffingTarget" integer NOT NULL DEFAULT 0
);

ShiftEmployee DDL

CREATE TABLE "shiftEmployee" (
    "employeeId" uuid REFERENCES employee(id) ON DELETE CASCADE,
    "shiftId" uuid REFERENCES shift(id) ON DELETE CASCADE,
    status character varying(255) NOT NULL,
    "updatedAt" timestamp without time zone NOT NULL DEFAULT now(),
    "salaryFixed" numeric,
    "salaryHourly" numeric,
    "salaryCurrency" character varying(3) DEFAULT 'SEK'::character varying,
    CONSTRAINT "PK_6acfd2e8f947cee5a62ebff08a5" PRIMARY KEY ("employeeId", "shiftId")
);

Query

SELECT "id", "staffingTarget" FROM "shift" LEFT JOIN "shiftEmployee" "se" ON "se"."shiftId"="id" AND ("se"."status" = 'confirmed');

Result

id                                      staffingTarget
68bb0892-9bce-4d08-b40e-757cb0889e87    3
12d88ff7-9144-469f-8de5-3e316c4b3bbd    6
73c65656-e028-4f97-b855-43b00f953c7b    5
68bb0892-9bce-4d08-b40e-757cb0889e88    3
e3279b37-2ba5-4f1d-b896-70085f2ba345    4
e3279b37-2ba5-4f1d-b896-70085f2ba346    5
e3279b37-2ba5-4f1d-b896-70085f2ba346    5
789bd2fb-3915-4cda-a3d7-2186cf5bb01a    3

Upvotes: 0

Views: 1111

Answers (1)

user330315
user330315

Reputation:

If a post has more than one hidden comment, you will see that post multiple times because a join returns one row for each match - that's the nature of a join. And an outer join doesn't behave differently.

If your intention is to list only posts with hidden comments, it's better to use an EXISTS query instead:

SELECT p.id, p.title
FROM posts p
where exists (select *
              from comments c 
              where c.post_id = p.id 
                and c.status = 'hidden');

Upvotes: 2

Related Questions