RG20
RG20

Reputation: 189

How to extract multiple rows from a table based on values from multiple columns from another table and then concatenate in SQL?

I have two tables, Table 1 and Table 2. Table 1 have columns "start" and "end" . Table 2 has column "position" and "Sequence". I would like to extract the sequences from Table 2 from position = start to position = end and the create a new column with the concatenated string.

Table 1

Start End
100 104
105 109

Table 2

Position Seq
100 A
101 T
102 C
103 T
104 G
105 T
106 T
107 G
108 T
109 G

My final result needs to be

Start End Sequence
100 104 ATCTG
105 109 TTGTG

I tried concatenating the values in the Table 2 using the below statement

 SELECT Sequence = (Select '' + Seq 
 from Table2
 where Position >= 100 and Position <= 104
 order by Position FOR XML PATH('')

)

Upvotes: 1

Views: 997

Answers (4)

Ajax1234
Ajax1234

Reputation: 71451

You can generate row numbers for your first table which can later be used to group the ranges after joining on those numbers:

with to_id as (select row_number(*) over (order by t1.start) id, t1.* from table1 t1), 
     ranges as (select t3.id, t2.* from table2 t2 join to_id t3 on t3.start <= t2.position and t2.position <= t3.end)
select t3.start, t3.end, group_concat(r1.seq, '') from ranges r1 join to_id t3 on r1.id = t3.id group by r1.id;

Upvotes: 0

snj
snj

Reputation: 100

Look into how crosstab queries are done.

Upvotes: -1

SE1986
SE1986

Reputation: 2740

You don't state what DBMS you are using so here is a SQL Server solution using a CTE and FOR XML to perform the transpose:

; WITH SequenceCTE AS
(
    SELECT  [Start],
            [End],
            Seq
    FROM    Table1 a
            JOIN Table2 b
                ON b.Position >= a.[Start] AND
                  b.Position <= a.[End]
)
SELECT  DISTINCT
        a.[Start],
        a.[End],
        (
            SELECT  STUFF(',' + Seq,1,1,'')
            FROM    SequenceCTE b
            WHERE   a.[Start] = b.[Start] AND
                    a.[End] = b.[end]
            FOR XML PATH ('') 
        )
FROM    SequenceCTE a

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269603

In standard SQL, you can do something like this:

select t1.start, t1.end,
       listagg(t2.position, '') within group (order by t2.seq) as sequence
from table1 t1 join
     table2 t2
     on t2.position between t1.start and t2.end
group by t1.start, t1.end;

Most databases support aggregate string concatenation, but the function may have a different name and slightly different syntax.

Note that start and end are poor names for columns because they are SQL keywords -- as is sequence in most databases.

Upvotes: 0

Related Questions