user2165857
user2165857

Reputation: 2690

Select Intersecting and flanking regions in MySQL query

I am using a mysql database with a table with genomic islands in the format:

+----+-------+----------+----------+-----------------------------------------------+
| id | chrom | start    | end      | line_string                                   |
+----+-------+----------+----------+-----------------------------------------------+
|  1 |     1 | 36568608 | 36569851 |                 ??    ?o?A      ??   ?p?A       |
|  2 |     1 | 82313020 | 82313491 |                 ??   ????A      ??   L??A         |
+----+-------+----------+----------+-----------------------------------------------+

The linestring is in the format: GeomFromText('Linestring(chrom start, chrom end)') the numbers for "start" and "end" refer to basepair positions

I am currently selecting Island vs. non-Island regions in my python script using:

SELECT 'Island' as Island FROM islands 
WHERE MBRIntersects(GeomFromText('Linestring(%d %d, %d %d)'), line_string) 
UNION ALL SELECT 'non-Island' LIMIT 1 % (Chr, Start, Chr, End)

However, I would like to amend this query to also give island shores and shelfs defined as:

Island shore - 2,000 basepairs from Island

Island shelf - 2,000 to 4,000 basepairs from Island

Upvotes: 0

Views: 60

Answers (1)

user2165857
user2165857

Reputation: 2690

I solved the problem by using:

SELECT 'Island' as Island FROM methylation.islands FORCE INDEX (locations)
            WHERE MBRIntersects(GeomFromText('Linestring(%d %d, %d %d)'), line_string) 
        UNION ALL SELECT 'Shore' FROM methylation.islands FORCE INDEX (locations) 
            WHERE MBRIntersects(GeomFromText('Linestring(%d %d, %d %d)'), line_string)
        UNION ALL SELECT 'Shelf' FROM methylation.islands FORCE INDEX (locations) 
            WHERE MBRIntersects(GeomFromText('Linestring(%d %d, %d %d)'), line_string)
        UNION ALL SELECT 'Other' LIMIT 1 
% (Chr, Start, Chr, End, Chr, Start-2000, Chr, End+2000, Chr, Start-4000, Chr, End+4000)

In this way anything that is an "Island" is listed as such, next if it's +/- 2,000 basepairs from an island it gets listed as a "shore", next if it's +/- 4,000 basepairs it's listed as a "shelf". Lastly everything else is considered "Other". By using a LIMIT 1 only the first found term is returned.

Upvotes: 1

Related Questions