Sim_R
Sim_R

Reputation: 3

Arranging VARCHAR data into Columns in SQL

My data is in a SQL table in the following format (I have about 20 different answers per user_id):

USER_ID     ANSWER
--------------------------
1           CAR
1           10-20 miles 
1           SALES 
2           TRAIN
2           0-10 miles
2           TEACHER

I would like to create a view in PostgreSQL where all the data is shown according to User ID

USER_ID     ANSWER1     ANSWER2        ANSWER3
-----------------------------------------------
1           CAR         10-20 miles    SALES
2           TRAIN       0-10 miles     TEACHER

Thank you!

Upvotes: 0

Views: 98

Answers (2)

Pavel Stehule
Pavel Stehule

Reputation: 45885

It is great example of not relational data (the semantic depends on row number), although the data are saved in relational database. You cannot to make relational data from not relational inside relational database. There is not any mechanism how to do this transformation safely. Theoretically, if physical order of your data is same like in your picture, then you can write queries (but you should not to use ORDER BY clause, because it can change order, because your data doesn't hold necessary data for correct ordering):

You need function:

CREATE OR REPLACE FUNCTION public.x_transformed()
 RETURNS TABLE(user_id integer, rowno integer, answer character varying)
 LANGUAGE plpgsql
AS $function$
declare r record; 
prev_user_id int;
begin
  for r in select * from x
  loop
    if user_id is distinct from r.user_id then
      rowno := 1;
    else
      rowno := rowno + 1;
    end if;
    user_id = r.user_id;
    answer = r.answer;
    return next;
  end loop;
end;
$function$

This function can add missing order information to relation. It is pretty ugly solution, but your input data are in very unhappy format, and there is not possible any clean solution (based on relational algebra):

postgres=# select * from x_transformed();
┌─────────┬───────┬─────────────┐
│ user_id │ rowno │   answer    │
╞═════════╪═══════╪═════════════╡
│       1 │     1 │ CAR         │
│       1 │     2 │ 10-20 miles │
│       1 │     3 │ SALES       │
│       2 │     1 │ TRAIN       │
│       2 │     2 │ 0-10 miles  │
│       2 │     3 │ TEACHER     │
└─────────┴───────┴─────────────┘
(6 rows)

Now, the transformation to the requested format is easy (search pivoting):

select user_id,
       max(answer) filter (where rowno = 1) as answer1,
       max(answer) filter (where rowno = 2) as answer2,
       max(answer) filter (where rowno = 3) as answer3
 from x_transformed() group by user_id;
┌─────────┬─────────┬─────────────┬─────────┐
│ user_id │ answer1 │   answer2   │ answer3 │
╞═════════╪═════════╪═════════════╪═════════╡
│       2 │ TRAIN   │ 0-10 miles  │ TEACHER │
│       1 │ CAR     │ 10-20 miles │ SALES   │
└─────────┴─────────┴─────────────┴─────────┘
(2 rows)

The main problem is in format of your input data. It can be good enough for file and application processing, but are terrible bad for storing (and processing) in relation database. The relation (or the table) is heap (not file).

With knowledge of Postgres's internals, you can ensure expected order although you use some relational operations. You can use internal implicit column ctid. ctid is an unique address of row. This will not work on other databases, and it will not work on older Postgres releases:

select user_id,
       max(answer) filter (where rowno = 1) as answer1,
       max(answer) filter (where rowno = 2) as answer2,
       max(answer) filter (where rowno = 3) as answer3
  from (select user_id,
               answer,
               row_number() over (partition by user_id
                                  order by ctid) as rowno
          from x) s
 group by user_id
 order by user_id;
┌─────────┬─────────┬─────────────┬─────────┐
│ user_id │ answer1 │   answer2   │ answer3 │
╞═════════╪═════════╪═════════════╪═════════╡
│       1 │ CAR     │ 10-20 miles │ SALES   │
│       2 │ TRAIN   │ 0-10 miles  │ TEACHER │
└─────────┴─────────┴─────────────┴─────────┘
(2 rows)

Upvotes: 1

trillion
trillion

Reputation: 1401

fiddle for regex: https://regex101.com/r/oUgYbj/1

This would only work assuming you always have the same order for q1,q2,q3 I believe your data should have some field to order it by in order to avoid confusion something like a new column: Question_rank and then it showing the value 1 for Car and Train and value 2 for miles and value 3 for Jobs i.e Sales / Teacher.

This way we can have Question_rank in the order by clause as second column

with data as(
select
user_id,
STRING_AGG(answer,',') as all_answers
from table
group by 1
Order by id
)
select
user_id,
REGEXP_MATCHES (all_answers,'^([^,])+') AS answer_1,
REGEXP_MATCHES (all_answers,'\d+(-)\d+\s+\w+') AS answer_2
REGEXP_MATCHES (all_answers,'[^,]*$') AS answer_3
from data

Upvotes: 0

Related Questions