Reputation: 3
There is a static old site, with 50 pages on html. The question is how to implement a less fast search? Which way to look? I also did a script on php which simply searches for text in files, but it works sooooo slowly, there are some methods for indexing pages or something like that.
<?php
ini_set('max_execution_time', 900);
if(!isset($_GET['s'])) {
die('You must define a search term!');
}
$search_in = array('html', 'htm');
$search_dir = '.';
$countWords = 15;
$files = list_files($search_dir);
$search_results = array();
foreach($files as $file){
$contents = file_get_contents($file);
preg_match_all("/\<p\>(.*)".$_GET['s']."(.*)\<\/p\>/i", $contents, $matches, PREG_SET_ORDER);
foreach($matches as $match){
$match[1] = trim_result($match[1]);
$match[2] = trim_result($match[2], true);
$match[1] .= '<span style="background: #ffff00;">';
$match[2] = '</span>'.$match[2];
preg_match("/\<title\>(.*)\<\/title\>/", $contents, $matches2);
$search_results[] = array($file, $match[1].$_GET['s'].$match[2], $matches2[1]);
}
}
?>
<html>
<head>
<title>Search results</title>
</head>
<body>
<?php foreach($search_results as $result) :?>
<div>
<h3><a href="<?php echo $result[0]; ?>"><?php echo $result[2]; ?></a></h3>
<p><?php echo $result[1]; ?></p>
</div>
<?php endforeach; ?>
</body>
</html>
<?php
function list_files($dir){
global $search_in;
$result = array();
if(is_dir($dir)){
if($dh = opendir($dir)){
while (($file = readdir($dh)) !== false) {
if(!($file == '.' || $file == '..')){
$file = $dir.'/'.$file;
if(is_dir($file) && $file != './.' && $file != './..'){
$result = array_merge($result, list_files($file));
}
else if(!is_dir($file)){
if(in_array(get_file_extension($file), $search_in)){
$result[] = $file;
}
}
}
}
}
}
return $result;
}
function get_file_extension($filename){
$result = '';
$parts = explode('.', $filename);
if(is_array($parts) && count($parts) > 1){
$result = end($parts);
}
return $result;
}
function trim_result($text, $start = false){
$words = split(' ', strip_tags($text));
if($start){
$words = array_slice($words, 0, $countWords);
}
else{
$start = count($words) - $countWords;
$words = array_slice($words, ($start < 0 ? 0 : $start), $countWords);
}
return implode(' ', $words);
}
?>
Upvotes: 0
Views: 296
Reputation: 26615
This isn't something you're going to solve (well) just by a script that runs at runtime.
You're going to want something to pre-parse it into one something that can quickly be searched through.
A simple method would be to parse it all into a text or JSON file. You can then load that one text file, search for your string and then handle it accordingly.
A more elegant method would be to use SQL database (MySQL, SQLite, SQL Server, etc) or NoSQL database (Mongo, Cassandra, etc.) to store the info and then run queries against it.
Probably the best solution though would be to use Solr to allow for proper searches. It's going to give the best results (and a lot of fine tuning), but may be overkill for your needs.
Upvotes: 0
Reputation: 1224
the best way for speed up the search is:
parse all files with a DOM parser and extract the content.
write this content in an sqlite database (for only 50 Pages you don´t need MYSQL)
then organize the live search with simple sql where statements.
Upvotes: 1