Mog
Mog

Reputation: 1

How does a person work with the BlastXML2 namespace on import?

The Xquery schema is shown here: https://www.ncbi.nlm.nih.gov/data_specs/schema_alt/NCBI_BlastOutput2.xsd (Outfmt = 16 on Blast+ command line).

The aim is to:

So far I have managed to filter on one search term and one blacklist item. But I had to do a very strange pathing to get there.

On the .xml document it is a simple path of Hit/description/HitDescr/title for example. You can see below that I had to repeatedly use //*:title[1] or what have you even once I'd pulled an item out, even when there's only one. This means the code breaks if I want to use 'Search' instead of 'Hit' and want to pull out the query name for the .csv.

I get an error saying title should be an item not a series but I've specified title as [1] and it's doing my head in. The error also goes got bit-score if the database has more than one .xml file in it for some strange reason. It worked for a database with exact one .xml file in.

declare namespace blast = "http://www.ncbi.nlm.nih.gov"
declare variable $searchTerm as xs:string external := "virus";
declare variable $blacklist as xs:string external := "Phage";
declare variable $bitscore as xs:int external := 50;


let $options := map { 
                     'format' : 'xquery',
                     'header': true(),
                     'separator': 'comma'
                     }  

let $hits := //*:Hit

let $hasParams := for $hit in $hits
                  where $hit//*:title[1][not(text() contains text {$blacklist})] and $hit//*:title[1][text() contains text {$searchTerm}] and $hit//*:bit-score[1][data() > $bitscore]
                  return $hit  

let $data := map { 
'names' : ['species name', 'bitscore'],
'records' : (for $entry in $hasParams
return[string($entry//*:title), string($entry//*:bit-score)]
)
}

return file:write(
  '/tmp/output.csv',
  csv:serialize($data, $options)
                 )

This works fine and forms a basis for building the .csv I was previously using Python for (slow because I have an entire folder of .xml files to do at a time this way), it just seems wrong.

Upvotes: 0

Views: 46

Answers (1)

Michael Kay
Michael Kay

Reputation: 163342

I suspect (but it's a bit of a guess because you don't describe the problem very precisely) that you're making the common mistake of writing $hit//*:title[1] when you meant ($hit//*:title)[1]. The former expression selects every title that is the first title child of its parent.

Upvotes: 0

Related Questions