angrykoala
angrykoala

Reputation: 4054

Identify css selector string vs XPath string

I'm working on a small querying module (in js) for html and I want to provide a generic query(selector) function supporting both, css selectors and XPath selectors as string argument.

Regardless of how each kind of selection is done, my problem here is how to identify whether a given string is an xpath or a css selector. We can assume that the function would be something like this:


function query(selector){
   selectorKind = identifySelectorKind(selector); // I want to know how to code this particular function

   if(selectorKind==="css") return queryCss(selector);
   if(selectorKind==="xPath") return queryXPath(selector); //Assume both functions exists and work
}

My first approach (given my limited knowledge of xPath queries) was to identify the query kind by checking if the first character is / (here I am assuming all relevant xPath queries begin with /)

So, identifySelectorKind would go a bit like this:

function identifySelectorKind(selector){
    if (selector[0] === "/") return "xPath";
    else return "css";
}

Note that I don't need to validate neither css nor xpath selectors, I only need an unambiguous way to differentiate them. Would this logic be enough? (in other words, all xPath selectors begin with / and no css selector begins the same way?), if not, is there a better way or some considerations I may want to know?

Upvotes: 0

Views: 757

Answers (3)

Thomas Di G
Thomas Di G

Reputation: 278

Searching only for / won't be enough, for sure!

Exemple CSS selector (that will be a false positive):
nav [itemtype="https://schema.org/BreadcrumbList"]

I'm writing also a utility function to either use querySelector or xpath, and need to differenciate the 2.

The problem here is that both syntax can have arbitrary strings in it:
xpath: //*[contains(text(),"string")]
css: *[some-attr="string"]

...so it's always possible to have, whatever char you use to descriminate, in both syntax. (A xpath string in css is valid, and so a css string in xpath):
xpath: //*[contains(text(),"a:hover:not(xpath)")]
css: *[xpath-attr="fuuu/xpath/also//here/*"]

The quick and dirty solution I found is to cut out first all the quoted strings, and then test for xpath only char (actually / or @).

const isXpath = str=>
    /[\/@]/.test(                     // find / or @ in
        str.split(/['"`]/)            // cut on any quote
            .filter( (s,i)=> !(i%2) ) // remove 1 on 2
            .join('')                 // string without quotes
    )


isXpath( 'nav [itemtype="https://schema.org/BreadcrumbList"] [itemtype="https://schema.org/ListItem"]' )
//> false 
// Actually search chars on "nav [itemtype=] [itemtype=]"

/!\ Note this is not perfect, and some cases will be confusing like the exemples given in this discussion * or div will fall back to CSS (isXpath = false). You may perfect quoted string cut out (what about escaped quotes?) and then xpath chars...

Upvotes: 1

Alohci
Alohci

Reputation: 82986

You can't necessarily. For example, * is a valid xpath and a valid css selector, but it matches a different set of elements in each.

Upvotes: 2

Jack Bashford
Jack Bashford

Reputation: 44107

If you're absolutely sure your XPath selector will always begin with /, then yes, it's fine. Note that an XPath selector doesn't have to begin with a /, but if yours always selects from the root, then it's fine.

Upvotes: 0

Related Questions