Rick Neubig
Rick Neubig

Reputation: 1

Find the max ID value from records in the same table from a group of records that are non-zero in another field

I have a series of bird species names that are sorted taxonomically (by ID). I would like to cluster sub-species (which currently have a clust value of 0) with the species (clust <> 0) that falls above them in the table.

Here is a subset of the data and below is the desired query output to update the clust with newclust

Table Data

ID Clust code
2 2 ostric2
3 0 comost1
4 0 comost2
5 0 comost3
6 0 comost4
7 3 ostric3
8 0 y00934
10 4 grerhe1
11 0 grerhe2
12 0 grerhe3
13 0 grerhe4
14 0 grerhe5
15 0 grerhe6
16 5 lesrhe2
17 6 lesrhe4
18 0 lesrhe1
19 0 lesrhe5
20 7 lesrhe3

I'd like to fill in the 0 values in clust with the number (ID of the species) directly above them in the table.

Desired query output

ID Clust code NewClust
2 2 ostric2 2
3 0 comost1 2
4 0 comost2 2
5 0 comost3 2
6 0 comost4 2
7 3 ostric3 3
8 0 y00934 3
10 4 grerhe1 4
11 0 grerhe2 4
12 0 grerhe3 4
13 0 grerhe4 4
14 0 grerhe5 4
15 0 grerhe6 4
16 5 lesrhe2 5
17 6 lesrhe4 6
18 0 lesrhe1 6
19 0 lesrhe5 6
20 7 lesrhe3 7

I tried a left join thinking that it would give me access to all of the records in the a table to find that record where a.id is < b.id and a.clust is non-zero.

select b.id, b.clust, b.code, (select max(a.id) where a.clust<>0 and a.id<b.id)   
    from db as a 
    left join db as b on a.id=b.id;

The query gives me a null value for Newclust whether I use max or not. I suspect it is due to forcing a.id=b.id. I can see how to do this with a procedure-oriented programming language but would like to use a mysql query.

Thanks

Upvotes: 0

Views: 53

Answers (2)

ValNik
ValNik

Reputation: 5926

Usually such tasks are solved using window functions. Take a look at the topic "gaps and islands".
When using self-join, as in your example, you need to calculate the join condition.

See example

select t.*,coalesce(b.Clust,t.Clust) as newClust
from(
select a.*
  ,(select max(a2.id) from data a2 where a2.clust<>0 and a2.id<a.id) prevId
from data as a 
  )t
left join data as b on b.id=t.prevId

OR, same

select a.*,coalesce(b.Clust,a.Clust) as newClust
from data as a 
left join data as b 
  on b.id=(select max(a2.id) from data a2 where a2.clust<>0 and a2.id<a.id)
ID Clust code prevId newClust
2 2 ostric2 null 2
3 0 comost1 2 2
4 0 comost2 2 2
5 0 comost3 2 2
6 0 comost4 2 2
7 3 ostric3 2 2
8 0 y00934 7 3
10 4 grerhe1 7 3
11 0 grerhe2 10 4
12 0 grerhe3 10 4
13 0 grerhe4 10 4
14 0 grerhe5 10 4
15 0 grerhe6 10 4
16 5 lesrhe2 10 4
17 6 lesrhe4 16 5
18 0 lesrhe1 17 6
19 0 lesrhe5 17 6
20 7 lesrhe3 17 6

fiddle

With window funtions and method "gaps and islands", every row with Cust<>0 is first row of group - gap. Running sum of "first row of group" (gap number) is number of group.
Then we calculate new Clust as max(Clust) within group (partition by ...).

select *
  ,max(Clust)over(partition by clustGroup)newClust
from(
select *
  ,sum(case when Clust<>0 then 1 else 0 end)over(order by id)clustGroup
from data
)t
ID Clust code clustGroup newClust
2 2 ostric2 1 2
3 0 comost1 1 2
4 0 comost2 1 2
5 0 comost3 1 2
6 0 comost4 1 2
7 3 ostric3 2 3
8 0 y00934 2 3
10 4 grerhe1 3 4
11 0 grerhe2 3 4
12 0 grerhe3 3 4
13 0 grerhe4 3 4
14 0 grerhe5 3 4
15 0 grerhe6 3 4
16 5 lesrhe2 4 5
17 6 lesrhe4 5 6
18 0 lesrhe1 5 6
19 0 lesrhe5 5 6
20 7 lesrhe3 6 7

fiddle

Upvotes: 0

bandc
bandc

Reputation: 20

Use the Custom variables:

SELECT
ID,
Clust,
CODE,
case when Clust != 0 then @pre := Clust ELSE @pre END AS NewClust
FROM test,(SELECT @pre := NULL) tmp
ORDER BY ID

Upvotes: 0

Related Questions