Reputation: 1601
Following-up on my question here. Say I have a table in an Oracle database like the one below (table_1) which tracks service involvement for a particular individual:
name day srvc_ inv
bill 1 1
bill 2 1
bill 3 0
bill 4 0
bill 5 1
bill 6 0
susy 1 1
susy 2 0
susy 3 1
susy 4 0
susy 5 1
My goal is to get a summary table which lists, for all unique individuals, whether there was service involvement and the number of distinct service episodes (in this case 2 for bill and 3 for susy), where a distinct service episode is identified by a break in activity over days.
To get any service involvement, I would use the following query
SELECT table_1."Name", MAX(table_1."Name") AS "any_invl"
FROM table_1
GROUP BY table_1."Name"
However, I'm stuck as to how I would get the number of service involvements (2). Using a static dataframe in R, you would use run length encoding (see my original question), but I don't know how I could accomplish this in SQL. This operation would be run over a large number of records so it would be impractical to store the entire data frame as an object and then run it in R.
Edit: My expect output would be as follows:
name any_invl n_srvc_inv
bill 1 2
susy 1 3
Thanks for any help!
Upvotes: 3
Views: 226
Reputation: 1269493
I would suggest using lag()
. The idea is to count a "1", but only when the preceding value is zero or null
:
select name, count(*)
from (select t.*,
lag(srvc_inv) over (partition by name order by day) as prev_srvc_inv
from t
) t
where (prev_srvc_inv is null or prev_srvc_inv = 0) and
srvc_inv = 1
group by name;
You can simplify this a little by using a default value for lag()
:
select name, count(*)
from (select t.*,
lag(srvc_inv, 1, 0) over (partition by name order by day) as prev_srvc_inv
from t
) t
where prev_srvc_inv = 0 and srvc_inv = 1
group by name;
Upvotes: 2
Reputation: 13509
You can try below query, having LAG function to handle the change in srvc_invl
select name, 1 any_invl, count(case when diff = 1 then 1 end) n_srvc_inv
from (select name, day, srvc_inv - LAG(srvc_inv, 1, 0) OVER(ORDER BY name, day) diff
from tab
order by name, day) temp
group by name
Here is the fiddle for your reference.
Upvotes: 1
Reputation: 142720
Something like this?
SQL> with test (name, day, srvc_inv) as
2 (select 'bill', 1, 1 from dual union all
3 select 'bill', 2, 1 from dual union all
4 select 'bill', 3, 0 from dual union all
5 select 'bill', 4, 0 from dual union all
6 select 'bill', 5, 1 from dual union all
7 select 'bill', 6, 0 from dual union all
8 select 'susy', 1, 1 from dual union all
9 select 'susy', 2, 0 from dual union all
10 select 'susy', 3, 1 from dual union all
11 select 'susy', 4, 0 from dual union all
12 select 'susy', 5, 1 from dual
13 ),
14 inter as
15 (select name, day, srvc_inv,
16 nvl(lead(srvc_inv) over (partition by name order by day), 0) lsrvc
17 from test
18 )
19 select name,
20 sum(case when srvc_inv <> lsrvc and lsrvc = 0 then 1
21 else 0
22 end) grp
23 from inter
24 group by name;
NAME GRP
---- ----------
bill 2
susy 3
SQL>
Upvotes: 3