Reputation: 2690
I am using a mysql database with a table with genomic islands in the format:
+----+-------+----------+----------+-----------------------------------------------+
| id | chrom | start | end | line_string |
+----+-------+----------+----------+-----------------------------------------------+
| 1 | 1 | 36568608 | 36569851 | ?? ?o?A ?? ?p?A |
| 2 | 1 | 82313020 | 82313491 | ?? ????A ?? L??A |
+----+-------+----------+----------+-----------------------------------------------+
The linestring is in the format: GeomFromText('Linestring(chrom start, chrom end)')
the numbers for "start" and "end" refer to basepair positions
I am currently selecting Island vs. non-Island regions in my python script using:
SELECT 'Island' as Island FROM islands
WHERE MBRIntersects(GeomFromText('Linestring(%d %d, %d %d)'), line_string)
UNION ALL SELECT 'non-Island' LIMIT 1 % (Chr, Start, Chr, End)
However, I would like to amend this query to also give island shores and shelfs defined as:
Island shore - 2,000 basepairs from Island
Island shelf - 2,000 to 4,000 basepairs from Island
Upvotes: 0
Views: 60
Reputation: 2690
I solved the problem by using:
SELECT 'Island' as Island FROM methylation.islands FORCE INDEX (locations)
WHERE MBRIntersects(GeomFromText('Linestring(%d %d, %d %d)'), line_string)
UNION ALL SELECT 'Shore' FROM methylation.islands FORCE INDEX (locations)
WHERE MBRIntersects(GeomFromText('Linestring(%d %d, %d %d)'), line_string)
UNION ALL SELECT 'Shelf' FROM methylation.islands FORCE INDEX (locations)
WHERE MBRIntersects(GeomFromText('Linestring(%d %d, %d %d)'), line_string)
UNION ALL SELECT 'Other' LIMIT 1
% (Chr, Start, Chr, End, Chr, Start-2000, Chr, End+2000, Chr, Start-4000, Chr, End+4000)
In this way anything that is an "Island" is listed as such, next if it's +/- 2,000 basepairs from an island it gets listed as a "shore", next if it's +/- 4,000 basepairs it's listed as a "shelf". Lastly everything else is considered "Other". By using a LIMIT 1
only the first found term is returned.
Upvotes: 1