Reputation: 365
Good morning!
I'm working with SQL and need group a serie of samples in 2 different ways. I have an initial data frame like the following:
customer_id sample_id
1 1
1 11
1 2
2 3
2 4
2 5
2 5
What I want to achieve is grouping the total different id samples and the non-single id samples, taking in to account that id samples greater than 10 are the same as its unit, i.e. sample 11 is the same as sample 1.
My desire result would be the following:
customer_id total_sample ns_sample
1 2 1
2 3 1
total_sample for customer 1 is 2 cause we have samples 1 and 2 (11 is the same as 1), and for customer 2 is 3 cause we have samples 3, 4 and 5.
ns_sample is 1 for both customers cause the have just 1 duplicated (or more) sample (1 for customer 1 and 5 for customer 2).
Right now I have the next query:
SELECT t.customer_id,
count(distinct CASE WHEN LEFT(t.sample_id, 1) = 1 THEN t.sample_id - 10 ELSE t.sample_id END) as total_sample
FROM my_table as t
GROUP BY t.customer_id
However, I don't know how to achieve the column ns_sample in the same query...
Thanks you very much in advance!
Upvotes: 1
Views: 29
Reputation: 49375
This query will give you what you need and it works on mysql 5.x
You have to REPLACE the 11 and other Value that are same with Replace, or else this wouldn't work.
The t1 table that is left joined has actually all the sample_ids that are doubles, so if you b´need that as well, you can get it from that query
SELECT
t.`customer_id`
, COUNT(DISTINCT (t.`sample_id`)) total_sample
, MAX(IF(t1.count_ IS NULL,0,1)) ns_sample
FROM
(SELECT
`customer_id`, REPLACE(`sample_id`, '11', '1') `sample_id`
FROM
mytable) t
LEFT JOIN
(SELECT
`customer_id`, `sample_id`, COUNT(*) count_
FROM
(SELECT
`customer_id`, REPLACE(`sample_id`, '11', '1') `sample_id`
FROM
mytable) mytable
GROUP BY `customer_id` , `sample_id`
HAVING count_ > 1) t1 ON t.`customer_id` = t1.`customer_id`
GROUP BY `customer_id`;
Schema (MySQL v5.7)
CREATE TABLE mytable (
`customer_id` INTEGER,
`sample_id` INTEGER
);
INSERT INTO mytable
(`customer_id`, `sample_id`)
VALUES
('1', '1'),
('1', '11'),
('1', '2'),
('2', '3'),
('2', '4'),
('2', '5'),
('2', '5');
Query #1
SELECT
t.`customer_id`
, COUNT(DISTINCT (t.`sample_id`)) total_sample
, MAX(IF(t1.count_ IS NULL,0,1)) ns_sample
FROM
(SELECT
`customer_id`, REPLACE(`sample_id`, '11', '1') `sample_id`
FROM
mytable) t
LEFT JOIN
(SELECT
`customer_id`, `sample_id`, COUNT(*) count_
FROM
(SELECT
`customer_id`, REPLACE(`sample_id`, '11', '1') `sample_id`
FROM
mytable) mytable
GROUP BY `customer_id` , `sample_id`
HAVING count_ > 1) t1 ON t.`customer_id` = t1.`customer_id`
GROUP BY `customer_id`;
| customer_id | total_sample | ns_sample |
| ----------- | ------------ | --------- |
| 1 | 2 | 1 |
| 2 | 3 | 1 |
Upvotes: 1