Reputation: 3
My data is in a SQL table in the following format (I have about 20 different answers per user_id):
USER_ID ANSWER
--------------------------
1 CAR
1 10-20 miles
1 SALES
2 TRAIN
2 0-10 miles
2 TEACHER
I would like to create a view in PostgreSQL where all the data is shown according to User ID
USER_ID ANSWER1 ANSWER2 ANSWER3
-----------------------------------------------
1 CAR 10-20 miles SALES
2 TRAIN 0-10 miles TEACHER
Thank you!
Upvotes: 0
Views: 98
Reputation: 45885
It is great example of not relational data (the semantic depends on row number), although the data are saved in relational database. You cannot to make relational data from not relational inside relational database. There is not any mechanism how to do this transformation safely. Theoretically, if physical order of your data is same like in your picture, then you can write queries (but you should not to use ORDER BY
clause, because it can change order, because your data doesn't hold necessary data for correct ordering):
You need function:
CREATE OR REPLACE FUNCTION public.x_transformed()
RETURNS TABLE(user_id integer, rowno integer, answer character varying)
LANGUAGE plpgsql
AS $function$
declare r record;
prev_user_id int;
begin
for r in select * from x
loop
if user_id is distinct from r.user_id then
rowno := 1;
else
rowno := rowno + 1;
end if;
user_id = r.user_id;
answer = r.answer;
return next;
end loop;
end;
$function$
This function can add missing order information to relation. It is pretty ugly solution, but your input data are in very unhappy format, and there is not possible any clean solution (based on relational algebra):
postgres=# select * from x_transformed();
┌─────────┬───────┬─────────────┐
│ user_id │ rowno │ answer │
╞═════════╪═══════╪═════════════╡
│ 1 │ 1 │ CAR │
│ 1 │ 2 │ 10-20 miles │
│ 1 │ 3 │ SALES │
│ 2 │ 1 │ TRAIN │
│ 2 │ 2 │ 0-10 miles │
│ 2 │ 3 │ TEACHER │
└─────────┴───────┴─────────────┘
(6 rows)
Now, the transformation to the requested format is easy (search pivoting):
select user_id,
max(answer) filter (where rowno = 1) as answer1,
max(answer) filter (where rowno = 2) as answer2,
max(answer) filter (where rowno = 3) as answer3
from x_transformed() group by user_id;
┌─────────┬─────────┬─────────────┬─────────┐
│ user_id │ answer1 │ answer2 │ answer3 │
╞═════════╪═════════╪═════════════╪═════════╡
│ 2 │ TRAIN │ 0-10 miles │ TEACHER │
│ 1 │ CAR │ 10-20 miles │ SALES │
└─────────┴─────────┴─────────────┴─────────┘
(2 rows)
The main problem is in format of your input data. It can be good enough for file and application processing, but are terrible bad for storing (and processing) in relation database. The relation (or the table) is heap (not file).
With knowledge of Postgres's internals, you can ensure expected order although you use some relational operations. You can use internal implicit column ctid
. ctid
is an unique address of row. This will not work on other databases, and it will not work on older Postgres releases:
select user_id,
max(answer) filter (where rowno = 1) as answer1,
max(answer) filter (where rowno = 2) as answer2,
max(answer) filter (where rowno = 3) as answer3
from (select user_id,
answer,
row_number() over (partition by user_id
order by ctid) as rowno
from x) s
group by user_id
order by user_id;
┌─────────┬─────────┬─────────────┬─────────┐
│ user_id │ answer1 │ answer2 │ answer3 │
╞═════════╪═════════╪═════════════╪═════════╡
│ 1 │ CAR │ 10-20 miles │ SALES │
│ 2 │ TRAIN │ 0-10 miles │ TEACHER │
└─────────┴─────────┴─────────────┴─────────┘
(2 rows)
Upvotes: 1
Reputation: 1401
fiddle for regex: https://regex101.com/r/oUgYbj/1
This would only work assuming you always have the same order for q1,q2,q3 I believe your data should have some field to order it by in order to avoid confusion something like a new column: Question_rank and then it showing the value 1 for Car and Train and value 2 for miles and value 3 for Jobs i.e Sales / Teacher.
This way we can have Question_rank in the order by clause as second column
with data as(
select
user_id,
STRING_AGG(answer,',') as all_answers
from table
group by 1
Order by id
)
select
user_id,
REGEXP_MATCHES (all_answers,'^([^,])+') AS answer_1,
REGEXP_MATCHES (all_answers,'\d+(-)\d+\s+\w+') AS answer_2
REGEXP_MATCHES (all_answers,'[^,]*$') AS answer_3
from data
Upvotes: 0