Reputation: 3736
I have a one to many relationship. In this case, it's a pipelines
entity that can have many segments
. The segments
entity has a column to list the wells associated with this pipeline. This column is purely informational, and is only updated from a regulatory source as a comma separated list, so the data type is text
.
What I want to do is to list all the pipelines
and show the segment
column that has the most associated wells. Each well is identified with a standardized land location (text is the same length for each well). I am also doing other aggregate functions on the segments
, so my query looks something like this (I have to simplify it because it's pretty large):
SELECT pipelines.*, max(segments.associated_wells), min(segments.days_without_production), max(segments.production_water_m3)
FROM pipelines
JOIN segments ON segments.pipeline_id = pipelines.id
GROUP BY pipelines.id
This selects the associated_wells
that has the highest alphabetical value, which makes sense, but is not what I want.
max(length(segments.associated_wells))
will select the record I want, but only show the length. I need to show the column value.
How can I aggregate based on the string length but show the value?
Here's an example of what I am expecting:
Segment entity:
| id | pipeline_id | associated_wells | days_without_production | production_water_m3 |
|----|-------------|--------------------------|-------------------------|---------------------|
| 1 | 1 | 'location1', 'location2' | 30 | 2.3 |
| 2 | 1 | 'location1' | 15 | 1.4 |
| 3 | 2 | 'location1' | 20 | 1.8 |
Pipeline entity:
| id | name |
|----|-------------|
| 1 | 'Pipeline1' |
| 2 | 'Pipeline2' |
| | |
Desired Query Result:
| id | name | associated_wells | days_without_production | production_water_m3 |
|----|-------------|--------------------------|-------------------------|---------------------|
| 1 | 'Pipeline1' | 'location1', 'location2' | 15 | 2.3 |
| 2 | 'Pipeline2' | 'location1' | 20 | 1.8 |
| | | | | |
Upvotes: 1
Views: 1602
Reputation: 6721
Keep normalising and verticalise the locations/associated wells, by cross joining with a series of integers, and then group twice:
WITH
segment(seg_id,pipeline_id,associated_wells,days_without_production,production_water_m3) AS (
SELECT 1,1,'location1, location2',30,2.3
UNION ALL SELECT 2,1,'location1',15,1.4
UNION ALL SELECT 3,2,'location1',20,1.8
)
,
pipeline(pipeline_id,name) AS (
SELECT 1,'Pipeline1'
UNION ALL SELECT 2,'Pipeline2'
)
,
i(i) AS (
SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
)
,
location AS (
SELECT
seg_id
, i AS loc_id
, SPLIT_PART(associated_wells,', ',i) AS location
FROM segment CROSS JOIN i
WHERE SPLIT_PART(associated_wells,',' ,i) <>''
)
,
pregroup AS (
SELECT
segment.pipeline_id
, location.location
, MIN(days_without_production) AS days_without_production
, MAX(production_water_m3) AS production_water_m3
FROM segment
JOIN pipeline USING(pipeline_id)
JOIN location USING(seg_id)
GROUP BY 1,2
)
SELECT
pipeline_id
, STRING_AGG(location,',') AS locations
, MIN(days_without_production) AS days_without_production
, MAX(production_water_m3) AS production_water_m3
FROM pregroup
GROUP BY 1;
pipeline_id | locations | days_without_production | production_water_m3
-------------+---------------------+-------------------------+---------------------
1 | location1,location2 | 15 | 2.3
2 | location1 | 20 | 1.8
Upvotes: 0
Reputation: 1269443
If I understand correctly, you want DISTINCT ON
:
SELECT DISTINCT ON (p.id) p.*, s.*
FROM pipelines p JOIN
segments s
ON s.pipeline_id = p.id
ORDER BY p.id, LENGTH(s.associated_wells) DESC;
Upvotes: 1