Reputation: 371
I have an XML document:
<items>
<item>
<id>1</id>
<title>Title ABC Defg</title>
<author>Author Name</author>
<description>Description text </description>
</item>
...
</items>
And i would like to do a serching and check title, author, description if contains a phrase
I dont know how to do it at once and order it by relevancy. But it is not such important as searching for "Word" and "word". I used the php code:
<?php
$xml=simplexml_load_file(file.xml);
$query=$_GET['query'];
$nodes= $xml->xpath("//item[contains(title,'$query')]");
$count = count($nodes);
for ($i=1;$i<=$count;$i++){
$nodes= $xml->xpath("//item[contains(title,'$query')][$i]");
foreach($nodes as $node) {
$title = $node->title;
$desc= $node->description;
$auth= $node->auth;
$id= $node->id;
echo "id: $id<br />title: $title<br />author: $auth<br />desc: $desc<p> </p>
?>
I know it searches only titles but the problem is that when i search for Word it cant find word and i would like to get both : word and Word
If you could also help me with "connecting" searching in author title and description and to order it somehow i would really appriciate.
EDIT:
I have manage to search in all tags (not only specified but for me it is ok)
so i have code like this:
$query=strtolower(rawurldecode($_GET['s']));
$nodes= $xml->xpath("//item[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),'$query')]") // . - all i suppose
i also use kind of validation of $query
Upvotes: 2
Views: 6776
Reputation: 198101
So you want to know, how to select all children of <items><item>
elements that contain the text search for with xpath (I leave case sensitivity out, you find that on the linked answers). First of all, all item elements:
//items/item
You already have that. To only return those that contain some text, add the predicate:
//items/item[contains(., 'XYZ')]
If you only want to search within the <title>
child element:
//items/item[contains(title, 'XYZ')]
This is basically what you have already, however you make your live needlessly hard: You don't need to do that twice, you can just iterate over the matches directly:
$nodes = $xml->xpath("//items/item[contains(title, 'XYZ')]");
foreach ($nodes as $node)
{
foreach ($node as $name => $prop) {
printf("%s: %s\n", $name, $prop);
}
echo "\n";
}
Output:
id: 3
title: Title XYZ
author: Author Name
description: Description text
To learn about how to escape input to xpath (which is read-only, so this is not as dangerous as a SQL injection), consider the following example:
$query = 'XYZ';
$expression = sprintf("//item[contains(title,'%s')]", $query);
$nodes = $xml->xpath($expression);
It will create the following expression:
//item[contains(title,'XYZ')]
But what happens if there is some single quote in there? It will terminate the string and therefore create an error:
$query = 'd\'oh';
Will give:
Warning: SimpleXMLElement::xpath(): Invalid expression in ...
You can prevent this by doing something as outline here, specifically assigning the value to the document and comparing against it then:
$query = 'd\'oh';
$xml['query'] = $query;
$nodes = $xml->xpath("//item[contains(title, /*/@query)]");
Old: You ask multiple questions at once:
Relevance is undefined. What could be relevant for one could be irrelevant for others, so it's hard to answer that part of your question without a specific definition on how relevancy could be metriced.
For case-insensitivity search, duplicate questions have been already linked, so you should be able to do that. Best first dulicate in my eyes:
But here as well it remains undefined what case, lower and upper, is. You have not specified a thing, so your question can not be really answered.
Also you don't really validate your input:
$query = $_GET['query'];
$nodes = $xml->xpath("//item[contains(title,'$query')]");
It's possible to inject xpath here with the GET parameter. Take care, otherwise you won't do any search at all.
Upvotes: 4