kenorb
kenorb

Reputation: 166329

What's correct syntax to SELECT statement in YQL results?

I'm using boss.search table to extract the data in XML, but I would like to retrieve specific fields.

The standard syntax is:

SELECT * from boss.search WHERE q="stackoverflow" AND ck = 'ConsumerKey' AND secret = 'ConsumerSecret';

which works.

But when I'm trying to select fields like "/title" or "/results/result/title", I've got the syntax error:

Query syntax error(s) [line 1:7 expecting fields_or_star got '/result/title']

What's the correct syntax of retrieving fields (such as title)?

The REST response looks like:

<results>
    <bossresponse>
        <web>
            <results>
                <result>
                    <date/>
                    <clickurl>https://example.com/</clickurl>
                    <url>https://example.com/</url>
                    <dispurl type="default">https://example.com/</dispurl>
                    <title type="default">Example</title>
                    <abstract type="default">Example</abstract>
                </result>
                <result>

Upvotes: 1

Views: 604

Answers (1)

kenorb
kenorb

Reputation: 166329

You can check that at 'Syntax of SELECT - Specifying the Elements Returned' (cached) Yahoo Query Language documentation.

If the fields in the result set contain sub-fields, you can indicate the sub-fields by using periods (dots) as delimiters.

Custom tables

For example, for the social.profile table, to get only the imageUrl sub-field of the image field from the following example:

. . .
<results>
    <profile xmlns="http://social.yahooapis.com/v1/schema.rng">
        <image>
            <imageUrl>http://l.yimg.com/us.yimg.com/i/identity/nopic_192.gif</imageUrl>
        </image>
    </profile>
</results>

enter the following:

select image.imageUrl from social.profile where guid=me

BOSS API

In BOSS response it's a bit different. To get title from boss.search table, you need to name the field as: web.results.result.title, e.g.

SELECT web.results.result.title FROM boss.search WHERE q="stackoverflow"

HTML

To get just the content from an HTML page, you can specify content keyword after the word SELECT. A statement with the content keyword processes the HTML in the following order:

  1. It looks for any element named content within the elements found.
  2. If an element named content is not found, the statement looks for an attribute named content.
  3. If neither an element nor attribute named content is found, the statement returns the element's textContent.

For example, the following statement extracts only the HTML links (href tags) for Yahoo Groups:

SELECT href FROM html WHERE url="http://groups.yahoo.com/search?query=surfing&sort=relevance" and compat="html5" AND xpath='//li[contains(@class,"hbox groupsSearch-result-entry")]/h4/a'

The following statement, for example, returns the textContent of each anchor a tag retrieved by the XPath expression:

SELECT content FROM html WHERE url="http://groups.yahoo.com/search?query=surfing&sort=relevance" AND compat="html5" AND xpath='//li[contains(@class,"hbox groupsSearch-result-entry")]/h4/a'

Source: Extracting HTML - Using YQL and Open Data Tables at YQL Guide

Upvotes: 1

Related Questions