Reputation: 5251
I have an SQL table with 4 columns:
id
- varchar(50)g1
- varchar(50)g2
- varchar(50)datetime
- timestamp I have this query:
SELECT g1,
COUNT(DISTINCT id),
SUM(COUNT(DISTINCT id)) OVER () AS total,
(CAST(COUNT(DISTINCT id) AS float) / SUM(COUNT(DISTINCT id)) OVER ()) AS share
FROM my_table
and g2 = 'start'
GROUP BY 1
order by share desc
This query was built to answer: What is the distributions of g1
value out of the users?
Each id
may have multiple records in the table. I wish to consider the earliest one. early means, the minimum datetime
value.
id g1 g2 datetime
x1 a start 2016-01-19 21:01:22
x1 c start 2016-01-19 21:01:21
x2 b start 2016-01-19 09:03:42
x1 a start 2016-01-18 13:56:45
g1 count total share
a 2 4 0.5
b 1 4 0.25
c 1 4 0.25
we have 4 records, but I only want to consider two records:
x2 b start 2016-01-19 09:03:42
x1 a start 2016-01-18 13:56:45
which are the earliest records per id
.
g1 count total share
a 1 2 0.5
b 1 2 0.5
How do I consider only the earliest record, per id
, in the group by
Upvotes: 6
Views: 101
Reputation: 11152
You are querying from my_table
all the data although you only want to have the earliest date for an id
. I assume id
is the primary key in the table.
I suggest you define a view (or inline view) which queries only the earliest dates for the id
's and you use your query on that view instead of on my_table.
The view could be defined as so and would contain only id
's of earliest date:
select * from my_table a
where a.datetime = (select min(z.datetime) from my_table z where a.id = z.id) and a.g2 = 'start'
You can define that as a view or use it directly inline as in:
SELECT g1,
COUNT(DISTINCT id),
SUM(COUNT(DISTINCT id)) OVER () AS total,
(CAST(COUNT(DISTINCT id) AS float) / SUM(COUNT(DISTINCT id)) OVER ()) AS share
FROM (select a.id, a.g1, a.g2, a.datetime from my_table a where a.datetime = (select min(z.datetime) from my_table z where a.id = z.id) and a.g2 = 'start')
GROUP BY 1
order by share desc
Upvotes: 1
Reputation: 5031
Try with the below query.
;WITH cte_1
as (SELECT id, MIN(datetime) AS [Date]
FROM YourTable
GROUP BY id
)
SELECT yt.g1,
COUNT(DISTINCT yt.id) [Count],
SUM(COUNT(DISTINCT yt.id)) OVER () AS total,
(CAST(COUNT(DISTINCT yt.id) AS float) / SUM(COUNT(DISTINCT yt.id)) OVER ()) AS share
FROM cte_1 c
JOIN YourTable yt
ON yt.[datetime]=c.[Date] AND yt.id=c.id
and yt.g2 = 'start'
GROUP BY yt.g1
ORDER BY share DESC
Output :
Upvotes: 2
Reputation: 10206
I don't know what is your DBMS so here's a standard ANSI way to do this
SELECT T1.g1,
COUNT(DISTINCT id),
SUM(COUNT(DISTINCT id)) OVER () AS total,
(CAST(COUNT(DISTINCT id) AS float) / SUM(COUNT(DISTINCT id)) OVER ()) AS share
FROM my_table T1
INNER JOIN
(SELECT id, MIN(datetime) AS mindt
FROM mytable
GROUP BY id
) T2 ON T1.datetime=t2.mindt AND T1.id=T2.id
and T1.g2 = 'start'
GROUP BY 1
order by share desc
It might be slow if you have a large table and datetime
is not indexed.
Upvotes: 2
Reputation: 520918
Here is a solution which should work in SQL Server, and any database which supports CTE:
WITH cte AS
(
SELECT t1.g1,
COUNT(*) AS count
FROM yourTable t1
INNER JOIN
(
SELECT id, MIN(datetime) AS datetime
FROM yourTable
GROUP BY id
) t2
ON t1.id = t2.id AND
t1.datetime = t2.datetime
)
SELECT t.g1,
t.count,
(SELECT COUNT(*) FROM cte) AS total,
t.count / (SELECT COUNT(*) FROM cte) AS share
FROM cte t
Upvotes: 2