I have a series of bird species names that are sorted taxonomically (by ID). I would like to cluster sub-species (which currently have a clust value of 0) with the species (clust <> 0) that falls above them in the table. Here is a subset of the data and below is the desired query output to update the clust with newclust Table Data ID Clust code 2 2 ostric2 3 0 comost1 4 0 comost2 5 0 comost3 6 0 comost4 7 3 ostric3 8 0 y00934 10 4 grerhe1 11 0 grerhe2 12 0 grerhe3 13 0 grerhe4 14 0 grerhe5 15 0 grerhe6 16 5 lesrhe2 17 6 lesrhe4 18 0 lesrhe1 19 0 lesrhe5 20 7 lesrhe3 I'd like to fill in the 0 values in clust with the number (ID of the species) directly above them in the table. Desired query output ID Clust code NewClust 2 2 ostric2 2 3 0 comost1 2 4 0 comost2 2 5 0 comost3 2 6 0 comost4 2 7 3 ostric3 3 8 0 y00934 3 10 4 grerhe1 4 11 0 grerhe2 4 12 0 grerhe3 4 13 0 grerhe4 4 14 0 grerhe5 4 15 0 grerhe6 4 16 5 lesrhe2 5 17 6 lesrhe4 6 18 0 lesrhe1 6 19 0 lesrhe5 6 20 7 lesrhe3 7 I tried a left join thinking that it would give me access to all of the records in the a table to find that record where a.id is < b.id and a.clust is non-zero. select b.id, b.clust, b.code, (select max(a.id) where a.clust<>0 and a.id<b.id) from db as a left join db as b on a.id=b.id; The query gives me a null value for Newclust whether I use max or not. I suspect it is due to forcing a.id=b.id. I can see how to do this with a procedure-oriented programming language but would like to use a mysql query. Thanks

Reputation: 1

Find the max ID value from records in the same table from a group of records that are non-zero in another field

I have a series of bird species names that are sorted taxonomically (by ID). I would like to cluster sub-species (which currently have a clust value of 0) with the species (clust <> 0) that falls above them in the table.

Here is a subset of the data and below is the desired query output to update the clust with newclust

Table Data

ID	Clust	code
2	2	ostric2
3	0	comost1
4	0	comost2
5	0	comost3
6	0	comost4
7	3	ostric3
8	0	y00934
10	4	grerhe1
11	0	grerhe2
12	0	grerhe3
13	0	grerhe4
14	0	grerhe5
15	0	grerhe6
16	5	lesrhe2
17	6	lesrhe4
18	0	lesrhe1
19	0	lesrhe5
20	7	lesrhe3

I'd like to fill in the 0 values in clust with the number (ID of the species) directly above them in the table.

Desired query output

ID	Clust	code	NewClust
2	2	ostric2	2
3	0	comost1	2
4	0	comost2	2
5	0	comost3	2
6	0	comost4	2
7	3	ostric3	3
8	0	y00934	3
10	4	grerhe1	4
11	0	grerhe2	4
12	0	grerhe3	4
13	0	grerhe4	4
14	0	grerhe5	4
15	0	grerhe6	4
16	5	lesrhe2	5
17	6	lesrhe4	6
18	0	lesrhe1	6
19	0	lesrhe5	6
20	7	lesrhe3	7

I tried a left join thinking that it would give me access to all of the records in the a table to find that record where a.id is < b.id and a.clust is non-zero.

select b.id, b.clust, b.code, (select max(a.id) where a.clust<>0 and a.id<b.id)   
    from db as a 
    left join db as b on a.id=b.id;

The query gives me a null value for Newclust whether I use max or not. I suspect it is due to forcing a.id=b.id. I can see how to do this with a procedure-oriented programming language but would like to use a mysql query.

Thanks

Upvotes: 0

Answers (2)

ValNik

Reputation: 5926

Usually such tasks are solved using window functions. Take a look at the topic "gaps and islands".
When using self-join, as in your example, you need to calculate the join condition.

See example

select t.*,coalesce(b.Clust,t.Clust) as newClust
from(
select a.*
  ,(select max(a2.id) from data a2 where a2.clust<>0 and a2.id<a.id) prevId
from data as a 
  )t
left join data as b on b.id=t.prevId

OR, same

select a.*,coalesce(b.Clust,a.Clust) as newClust
from data as a 
left join data as b 
  on b.id=(select max(a2.id) from data a2 where a2.clust<>0 and a2.id<a.id)

ID	Clust	code	prevId	newClust
2	2	ostric2	null	2
3	0	comost1	2	2
4	0	comost2	2	2
5	0	comost3	2	2
6	0	comost4	2	2
7	3	ostric3	2	2
8	0	y00934	7	3
10	4	grerhe1	7	3
11	0	grerhe2	10	4
12	0	grerhe3	10	4
13	0	grerhe4	10	4
14	0	grerhe5	10	4
15	0	grerhe6	10	4
16	5	lesrhe2	10	4
17	6	lesrhe4	16	5
18	0	lesrhe1	17	6
19	0	lesrhe5	17	6
20	7	lesrhe3	17	6

fiddle

With window funtions and method "gaps and islands", every row with Cust<>0 is first row of group - gap. Running sum of "first row of group" (gap number) is number of group.
Then we calculate new Clust as max(Clust) within group (partition by ...).

select *
  ,max(Clust)over(partition by clustGroup)newClust
from(
select *
  ,sum(case when Clust<>0 then 1 else 0 end)over(order by id)clustGroup
from data
)t

ID	Clust	code	clustGroup	newClust
2	2	ostric2	1	2
3	0	comost1	1	2
4	0	comost2	1	2
5	0	comost3	1	2
6	0	comost4	1	2
7	3	ostric3	2	3
8	0	y00934	2	3
10	4	grerhe1	3	4
11	0	grerhe2	3	4
12	0	grerhe3	3	4
13	0	grerhe4	3	4
14	0	grerhe5	3	4
15	0	grerhe6	3	4
16	5	lesrhe2	4	5
17	6	lesrhe4	5	6
18	0	lesrhe1	5	6
19	0	lesrhe5	5	6
20	7	lesrhe3	6	7

fiddle

Upvotes: 0

bandc

Reputation: 20

Use the Custom variables:

SELECT
ID,
Clust,
CODE,
case when Clust != 0 then @pre := Clust ELSE @pre END AS NewClust
FROM test,(SELECT @pre := NULL) tmp
ORDER BY ID

Upvotes: 0

Find the max ID value from records in the same table from a group of records that are non-zero in another field

Answers (2)

Related Questions