U W
U W

Reputation: 1290

Tag all rows with the same values in Redshift SQL

Let's say I have a table as follows:

CREATE TABLE "test_ids" (
   var1 TEXT,
   var2 TEXT,
   var3 TEXT,
   time INT
);

INSERT INTO "test_ids" VALUES
('d1', 'i1', 'f1', 1),
('d2', 'i1', 'f1', 2),
('d2', 'i2', 'f1', 3),
('d1', 'i1', 'f2', 4),
('d3', 'i3', 'f3', 1),
('d3', 'i4', 'f3', 2),
('d3', 'i5', 'f3', 3),
('d3', 'i3', 'f4', 4),
('d4', 'i1', 'f5', 5);

There is a many-many-many relationship between the columns var1, var2, var3 with each combination of them being stored on a row with the time that this combination was first observed.

However, there's is a 1 to many relationship between a user and a variable. That is, one variable (be it var1, var2 or var3) can only ever be observed by one user.

I want to produce a table that applies a unique user_id to all rows that we work out belong to one user - for example, all rows with:

In this example, the output would be:

 user | var1 | var2 | var3 | time 
------+------+------+------+------
 u1   | d1   | i1   | f1   |    1
 u1   | d2   | i1   | f1   |    2
 u1   | d2   | i2   | f1   |    3
 u1   | d1   | i1   | f2   |    4
 u2   | d3   | i3   | f3   |    1
 u2   | d3   | i4   | f3   |    2
 u2   | d3   | i5   | f3   |    3
 u2   | d3   | i3   | f4   |    4
 u1   | d4   | i1   | f5   |    5  

Don't worry about how to make the user ID unique - I can do that part. I'm struggling to work out the recursive joining of IDs, however.

Upvotes: 1

Views: 406

Answers (1)

Gluz
Gluz

Reputation: 3264

Assuming this following case is possible:

d1, i1, f1
d1, i2, f2
d2, i2, f3,
d3, i3, f3

And assuming you would like to attribute all of these rows to same user, then my answer is you can't. In some, more mature, DBs there is a tree query but even this type of query won't work here. If it is only a theoretical question then you can stop here, if you would a practical answer then I would either run an algorithm with some kind of programming language or, better (at least performance wise), implement a solution in the operational system that adds this rows to the DB: Every time a new row enters, it will check if one of the keys in already attributes to a user, if it is, it keeps the user id, if not, assign new user id.

Upvotes: 1

Related Questions