Reputation: 1
I have a series of bird species names that are sorted taxonomically (by ID). I would like to cluster sub-species (which currently have a clust value of 0) with the species (clust <> 0) that falls above them in the table.
Here is a subset of the data and below is the desired query output to update the clust with newclust
Table Data
ID | Clust | code |
---|---|---|
2 | 2 | ostric2 |
3 | 0 | comost1 |
4 | 0 | comost2 |
5 | 0 | comost3 |
6 | 0 | comost4 |
7 | 3 | ostric3 |
8 | 0 | y00934 |
10 | 4 | grerhe1 |
11 | 0 | grerhe2 |
12 | 0 | grerhe3 |
13 | 0 | grerhe4 |
14 | 0 | grerhe5 |
15 | 0 | grerhe6 |
16 | 5 | lesrhe2 |
17 | 6 | lesrhe4 |
18 | 0 | lesrhe1 |
19 | 0 | lesrhe5 |
20 | 7 | lesrhe3 |
I'd like to fill in the 0 values in clust with the number (ID of the species) directly above them in the table.
Desired query output
ID | Clust | code | NewClust |
---|---|---|---|
2 | 2 | ostric2 | 2 |
3 | 0 | comost1 | 2 |
4 | 0 | comost2 | 2 |
5 | 0 | comost3 | 2 |
6 | 0 | comost4 | 2 |
7 | 3 | ostric3 | 3 |
8 | 0 | y00934 | 3 |
10 | 4 | grerhe1 | 4 |
11 | 0 | grerhe2 | 4 |
12 | 0 | grerhe3 | 4 |
13 | 0 | grerhe4 | 4 |
14 | 0 | grerhe5 | 4 |
15 | 0 | grerhe6 | 4 |
16 | 5 | lesrhe2 | 5 |
17 | 6 | lesrhe4 | 6 |
18 | 0 | lesrhe1 | 6 |
19 | 0 | lesrhe5 | 6 |
20 | 7 | lesrhe3 | 7 |
I tried a left join thinking that it would give me access to all of the records in the a table to find that record where a.id is < b.id and a.clust is non-zero.
select b.id, b.clust, b.code, (select max(a.id) where a.clust<>0 and a.id<b.id)
from db as a
left join db as b on a.id=b.id;
The query gives me a null value for Newclust whether I use max or not. I suspect it is due to forcing a.id=b.id. I can see how to do this with a procedure-oriented programming language but would like to use a mysql query.
Thanks
Upvotes: 0
Views: 53
Reputation: 5926
Usually such tasks are solved using window functions. Take a look at the topic "gaps and islands".
When using self-join, as in your example, you need to calculate the join condition.
See example
select t.*,coalesce(b.Clust,t.Clust) as newClust
from(
select a.*
,(select max(a2.id) from data a2 where a2.clust<>0 and a2.id<a.id) prevId
from data as a
)t
left join data as b on b.id=t.prevId
OR, same
select a.*,coalesce(b.Clust,a.Clust) as newClust
from data as a
left join data as b
on b.id=(select max(a2.id) from data a2 where a2.clust<>0 and a2.id<a.id)
ID | Clust | code | prevId | newClust |
---|---|---|---|---|
2 | 2 | ostric2 | null | 2 |
3 | 0 | comost1 | 2 | 2 |
4 | 0 | comost2 | 2 | 2 |
5 | 0 | comost3 | 2 | 2 |
6 | 0 | comost4 | 2 | 2 |
7 | 3 | ostric3 | 2 | 2 |
8 | 0 | y00934 | 7 | 3 |
10 | 4 | grerhe1 | 7 | 3 |
11 | 0 | grerhe2 | 10 | 4 |
12 | 0 | grerhe3 | 10 | 4 |
13 | 0 | grerhe4 | 10 | 4 |
14 | 0 | grerhe5 | 10 | 4 |
15 | 0 | grerhe6 | 10 | 4 |
16 | 5 | lesrhe2 | 10 | 4 |
17 | 6 | lesrhe4 | 16 | 5 |
18 | 0 | lesrhe1 | 17 | 6 |
19 | 0 | lesrhe5 | 17 | 6 |
20 | 7 | lesrhe3 | 17 | 6 |
With window funtions and method "gaps and islands", every row with Cust<>0 is first row of group - gap
. Running sum of "first row of group" (gap number) is number of group.
Then we calculate new Clust as max(Clust) within group (partition by ...).
select *
,max(Clust)over(partition by clustGroup)newClust
from(
select *
,sum(case when Clust<>0 then 1 else 0 end)over(order by id)clustGroup
from data
)t
ID | Clust | code | clustGroup | newClust |
---|---|---|---|---|
2 | 2 | ostric2 | 1 | 2 |
3 | 0 | comost1 | 1 | 2 |
4 | 0 | comost2 | 1 | 2 |
5 | 0 | comost3 | 1 | 2 |
6 | 0 | comost4 | 1 | 2 |
7 | 3 | ostric3 | 2 | 3 |
8 | 0 | y00934 | 2 | 3 |
10 | 4 | grerhe1 | 3 | 4 |
11 | 0 | grerhe2 | 3 | 4 |
12 | 0 | grerhe3 | 3 | 4 |
13 | 0 | grerhe4 | 3 | 4 |
14 | 0 | grerhe5 | 3 | 4 |
15 | 0 | grerhe6 | 3 | 4 |
16 | 5 | lesrhe2 | 4 | 5 |
17 | 6 | lesrhe4 | 5 | 6 |
18 | 0 | lesrhe1 | 5 | 6 |
19 | 0 | lesrhe5 | 5 | 6 |
20 | 7 | lesrhe3 | 6 | 7 |
Upvotes: 0
Reputation: 20
Use the Custom variables:
SELECT
ID,
Clust,
CODE,
case when Clust != 0 then @pre := Clust ELSE @pre END AS NewClust
FROM test,(SELECT @pre := NULL) tmp
ORDER BY ID
Upvotes: 0