cvc
cvc

Reputation: 37

MySQL using GROUP BY after multiple UNIONS

I have a fairly complex MySQL Query with multiple UNION statements. I am trying to eliminate duplicates from the final output, but not all the columns are identical (including the ID). So I want to use GROUP BY on the 'name' column to eliminate entries with the same name. This is for a map mashup of a number of other map layers. Some of the place markers appear on multiple layers (i.e. a restaurant may appear on the 'dining' layer, and the 'bars' layer, and the 'Home Cooking' layer, each with a different ID).

Each UNION SELECT statement produces only one row that has the place's 'name', but by the end, after all the UNIONs have run, I have multiple entries for the same place. So I feel like I need to somehow process the GROUP BY on all the data after every UNION has occurred.

I will try to illustrate with a simplified version of the statement...

(
    SELECT l.id AS lid, m.markername AS markername, m.id AS mid, m.layer AS mlayer
    FROM layertable 
    INNER JOIN markertable 
    ON m.layer LIKE Concat('%"',l.id,'"%') 
    WHERE l.id='66' 
    ## RESULT INCLUDES Joes Place AND Eatery
)
UNION
(
    SELECT l.id AS lid, m.markername AS markername, m.id AS mid, m.layer AS mlayer
    FROM layertable 
    INNER JOIN markertable 
    ON m.layer LIKE Concat('%"',l.id,'"%') 
    WHERE l.id='82' 
    ## RESULT INCLUDES Joes Place AND Eatery
)
UNION
(
    SELECT l.id AS lid, m.markername AS markername, m.id AS mid, m.layer AS mlayer
    FROM layertable 
    INNER JOIN markertable 
    ON m.layer LIKE Concat('%"',l.id,'"%') 
    WHERE l.id='91' 
    ## RESULT INCLUDES Joes Place
)

ORDER BY markername ASC
LIMIT 10

Results look something like...

| lid | markername | mid | mlayer               |
=================================================
| 66  | Joes Place | 10  | ["66","82","91"]     |
| 82  | Joes Place | 10  | ["66","82","91"]     |
| 91  | Joes Place | 10  | ["66","82","91"]     |
| 66  | Eatery     | 11  | ["66","82"]          |
| 82  | Eatery     | 11  | ["66","82"]          |

What I want is...

| lid | markername | mid | mlayer               |
=================================================
| 91  | Joes Place | 10  | ["66","82","91"]     |
| 82  | Eatery     | 11  | ["66","82"]          |

DISTINCT doesn't seem to work, because the records are not completely identical.

I have tried to put GROUP BY markername before and after ORDER BY but I get a syntax error message either way. Applying it within the individual SELECTS doesn't help because each table will only have one instance of the place anyway.

So, to reiterate my question: How to apply GROUP BY to the total list, after UNION and output just the uniquely named places? OR is there some other way to accomplish this task?

Thanks in advance.

Upvotes: 0

Views: 326

Answers (4)

Anthony
Anthony

Reputation: 37065

You could do a GROUP BY of all of the results, like:

SELECT * FROM 
(
    SELECT l.id AS lid, m.markername AS markername, m.id AS mid, m.layer AS mlayer
    FROM layertable AS l
    INNER JOIN markertable 
    ON m.layer LIKE Concat('%"',l.id,'"%') 
    WHERE l.id='66' 
    ## RESULT INCLUDES Joes Place AND Eatery
)
UNION
(
    SELECT l.id AS lid, m.markername AS markername, m.id AS mid, m.layer AS mlayer
    FROM layertable AS l
    INNER JOIN markertable 
    ON m.layer LIKE Concat('%"',l.id,'"%') 
    WHERE l.id='82' 
    ## RESULT INCLUDES Joes Place AND Eatery
)
UNION
(
    SELECT l.id AS lid, m.markername AS markername, m.id AS mid, m.layer AS mlayer
    FROM layertable AS l
    INNER JOIN markertable 
    ON m.layer LIKE Concat('%"',l.id,'"%') 
    WHERE l.id='91' 
    ## RESULT INCLUDES Joes Place
)

ORDER BY markername ASC
LIMIT 10
) AS makernames
GROUP BY makername

Or you could just leave out the parts that are causing the results to be non-unique. Like:

(
    SELECT m.markername AS markername, m.id AS mid, m.layer AS mlayer
    FROM layertable AS l
    INNER JOIN markertable 
    ON m.layer LIKE Concat('%"',l.id,'"%') 
    WHERE l.id='66' 
    ## RESULT INCLUDES Joes Place AND Eatery
)
UNION
(
    SELECT m.markername AS markername, m.id AS mid, m.layer AS mlayer
    FROM layertable AS l
    INNER JOIN markertable 
    ON m.layer LIKE Concat('%"',l.id,'"%') 
    WHERE l.id='82' 
    ## RESULT INCLUDES Joes Place AND Eatery
)
UNION
(
    SELECT m.markername AS markername, m.id AS mid, m.layer AS mlayer
    FROM layertable AS l
    INNER JOIN markertable 
    ON m.layer LIKE Concat('%"',l.id,'"%') 
    WHERE l.id='91' 
    ## RESULT INCLUDES Joes Place
)

ORDER BY markername ASC
LIMIT 10 

You don't need to have l.id in the SELECT clause for it to work in the WHERE clause. And I assume that if you are willing to lose the lid on some rows just to have only one row per makername that that means you don't actually need lid at all in the results.

Upvotes: 1

symcbean
symcbean

Reputation: 48357

(this should be a comment, but its a bit long)

because the records are not completely identical

Then you NEED TO BE A LOT MORE SPECIFIC ABOUT WHAT YOU MEAN BY DUPLICATES. In your example you have extracted the MAX(l.id) - is that your intention?

Why are you using a UNION here? You could just...

SELECT MAX(lid), markername, mid, mlayer
FROM ( 
  SELECT l.id AS lid, m.markername, m.id AS mid, m.layer AS mlayer
  FROM layertable 
  INNER JOIN markertable 
  ON m.layer LIKE Concat('%"',l.id,'"%') 
  WHERE l.id IN ('91', '82', '66')
  LIMIT 10
) AS ilv
GROUP BY markername, mid, mlayer

Using LIKE is a JOIN is horrible and implies your data is not normalized. Why are the integer values being quoted in your query?

(note that the subselect may be redundant / speeding things up or slowing them down depending on the distribution of your data and the indexes available).

Don't you think you should be organizing map data around geospatial indexing?

Upvotes: 0

slaakso
slaakso

Reputation: 9070

You can do it all with one query:

SELECT max(l.id) AS lid, m.markername AS markername, m.id AS mid, m.layer AS mlayer
FROM layertable 
INNER JOIN markertable 
ON m.layer LIKE Concat('%"',l.id,'"%') 
  WHERE l.id in ('66','82','91')
GROUP BY m.markername, m.id, m.layer

Upvotes: 0

ScaisEdge
ScaisEdge

Reputation: 133370

You could use a group by and max(id)

  select max(lid), markername, mid, mlayer 
  from (


  (
      SELECT l.id AS lid, m.markername AS markername, m.id AS mid, m.layer AS mlayer
      FROM layertable 
      INNER JOIN markertable 
      ON m.layer LIKE Concat('%"',l.id,'"%') 
      WHERE l.id='66' 
      ## RESULT INCLUDES Joes Place AND Eatery
  )
  UNION
  (
      SELECT l.id AS lid, m.markername AS markername, m.id AS mid, m.layer AS mlayer
      FROM layertable 
      INNER JOIN markertable 
      ON m.layer LIKE Concat('%"',l.id,'"%') 
      WHERE l.id='82' 
      ## RESULT INCLUDES Joes Place AND Eatery
  )
  UNION
  (
      SELECT l.id AS lid, m.markername AS markername, m.id AS mid, m.layer AS mlayer
      FROM layertable 
      INNER JOIN markertable 
      ON m.layer LIKE Concat('%"',l.id,'"%') 
      WHERE l.id='91' 
      ## RESULT INCLUDES Joes Place
  )

  ORDER BY markername ASC
  LIMIT 10 
  ) t
  group by markername, mid, mlayer 

Upvotes: 1

Related Questions