Reputation: 1032
I'm trying to work out if this is possible, let me give an example. Would be awesome if you could guide me in the right direction please.
Table = names
--------------------
Marks & Spencer
Marks & Spencer
marks & spencer
What I am trying to do is to return distinct values where I have converted all & signs and changed to upper case.
So my query is:
SELECT regexp_replace(UPPER(name), '&(amp;)*|\\+', '&', 'gi') AS name FROM names GROUP BY names;
However, what I would like to do is also return one of the original values, it does not matter which one, but I only want 1 row to be returned, like
Result
----------------
name original
------------------------
MARKS&SPENCER Marks & Spencer
Is this possible? Because at the moment, what I get returned is this:
Result
----------------
name original
------------------------
MARKS&SPENCER Marks & Spencer
MARKS&SPENCER Marks & Spencer
MARKS&SPENCER marks & spencer
Thank you for reading, would really appreciate the help.
==========
The query I am using to get the above result is:
SELECT names.name, T.result FROM names
INNER JOIN
(
SELECT DISTINCT regexp_replace(UPPER(name), '&(amp;)*|\\+', '&', 'gi') AS result FROM names
) AS T
ON regexp_replace(UPPER(name), '&(amp;)*|\\+', '&', 'gi')=T.result
GROUP BY T.result, names.name
ORDER BY T.result ASC
I am using PostgreSQL btw, which can do more than MySQL incase that changes things?
Upvotes: 3
Views: 971
Reputation: 659017
There is still room for improvement:
SELECT regexp_replace(upper(name), E'&(?:AMP;)+|\\+', '&', 'g') AS name
, min(name) AS min_org_name
-- , string_agg(name) AS org_names -- if you want a list of originals
-- , array_to_string(array_agg(name), ', ') AS org_names -- for pg < 9.0+
, count(*) AS ct
FROM (
SELECT *
FROM (VALUES
('Marks & Spencer')
, ('Marks & Spencer')
, ('marks & spencer')
, ('marks & speNceR + sons')
, ('marks &amp;AMP; speNceR & sons')
) AS names(name)
) name
GROUP BY 1;
&(amp;)*
with identical &(amp;)+
upper()
on the original, the 'i' flag only slows execution. Rather upper case pattern, too: &(AMP;)+
(?:)
\\+
, use proper syntax E''
GROUP BY
with positional parameter, no need to spell it out twiceUpvotes: 1
Reputation: 86798
At present you're grouping by the original field (you can't group by a field in your select).
Do you want one of these?
SELECT DISTINCT
name AS original,
regexp_replace(UPPER(name), '&(amp;)*|\\+', '&', 'gi') AS name
FROM
names
Or...
SELECT
name AS original,
regexp_replace(UPPER(name), '&(amp;)*|\\+', '&', 'gi') AS name
FROM
names
GROUP BY
name,
regexp_replace(UPPER(name), '&(amp;)*|\\+', '&', 'gi')
Or...
SELECT
original,
name
FROM
(
SELECT
name AS original,
regexp_replace(UPPER(name), '&(amp;)*|\\+', '&', 'gi') AS name
FROM
names
)
AS clean_data
GROUP BY
original,
name
Upvotes: 0
Reputation: 7803
You need to group by the new name to get only one row and, as you don't care which original name appears, aggregate it with something like min
:
SELECT min(name),regexp_replace(UPPER(name), '&(amp;)*|\\+', '&', 'gi') AS name
FROM names
GROUP BY regexp_replace(UPPER(name), '&(amp;)*|\\+', '&', 'gi')
Upvotes: 1