Reputation: 31
I'm using pgAdmin III / PostgreSQL 9.4 to store and work with my data. Sample of my current data:
x | y
--+--
0 | 1
1 | 1
2 | 1
5 | 2
5 | 2
2 | 2
4 | 3
6 | 3
2 | 3
How I'd like it to be formatted:
1, 2, 3
-- column names are unique y
values
0, 5, 4
-- the first respective x
values
1, 5, 6
-- the second respective x
values
2, 2, 2
-- etc.
It would need to be dynamic because I have millions of rows and thousands of unique values for y
.
Is using a dynamic pivot approach correct for this? I have not been able to successfully implement this:
DECLARE @columns VARCHAR(8000)
SELECT @columns = COALESCE(@columns + ',[' + cast(y as varchar) + ']',
'[' + cast(y as varchar)+ ']')
FROM tableName
GROUP BY y
DECLARE @query VARCHAR(8000)
SET @query = '
SELECT x
FROM tableName
PIVOT
(
MAX(x)
FOR [y]
IN (' + @columns + ')
)
AS p'
EXECUTE(@query)
It is stopping on the first line and giving the error:
syntax error at or near "@"
All dynamic pivot examples I've seen use this, so I'm not sure what I've done wrong. Any help is appreciated. Thank you for your time.
**Note: It is important for the x
values to be stored in the correct order, as sequence matters. I can add another column to indicate sequential order if necessary.
Upvotes: 3
Views: 1015
Reputation: 657052
The term "first row" assumes a natural order of rows, which does not exist in database tables. So, yes, you need to add another column to indicate sequential order
like you suspected. I am assuming a column tbl_id
for the purpose. Using the ctid
would be a measure of last resort. See:
The code you present looks like MS SQL Server code; invalid syntax for Postgres.
For millions of rows and thousands of unique values for Y
it wouldn't even make sense to try and return individual columns. Postgres has generous limits, but not nearly generous enough for that. According to the source code or the manual, the absolute maximum number of columns is 1600.
So we don't even get to discuss the restrictive characteristics of SQL, which demands to know columns and data types at execution time, not dynamically adjusted during execution. You would need two separate calls, like we discussed in great detail under this related question.
Another answer by Clodoaldo under the same question returns arrays. That can actually be completely dynamic. And that's what I suggest here, too. The query is actually rather simple:
WITH cte AS (
SELECT *, row_number() OVER (PARTITION BY y ORDER BY tbl_id) AS rn
FROM tbl
ORDER BY y, tbl_id
)
SELECT text 'y' AS col, array_agg (y) AS values
FROM cte
WHERE rn = 1
UNION ALL
( -- parentheses required
SELECT text 'x' || rn, array_agg (x)
FROM cte
GROUP BY rn
ORDER BY rn
);
Result:
col | values
----+--------
y | {1,2,3}
x1 | {0,5,4}
x2 | {1,5,6}
x3 | {2,2,2}
The CTE computes a row_number rn
for each row (each x
) per group of y
. We are going to use it twice, hence the CTE.
The 1st SELECT
in the outer query generates the array of y
values.
The 2nd SELECT
in the outer query generates all arrays of x
values in order. Arrays can have different length.
Why the parentheses for UNION ALL
? See:
Upvotes: 1