Reputation: 1047
I'm following http://simplehtmldom.sourceforge.net/ for making web-crawler using php, but im so confuse how to search for words without specifying an element. So word search is done based on all available data.
because the problem here is that now I am specifying the data being searched using the <p>
element but when there is no element <p>
the result is empty.
this is my code
<?php
include "simple_html_dom.php";
$html = file_get_html('https://adityadees.blogspot.com/');
foreach($html->find('<p>') as $element)
if (strpos($element, 'yang') !== false) {
echo $element;
} else {
echo $element;
}
?>
for example I want to try searching for words that contain 'yang' but, the results are empty because these words don't contain the < p> element.
but if the word is contained in the < p> element, the result goes well.
im tryng to change this line
foreach($html->find('<p>') as $element)
to
foreach($html->find() as $element)
but i got errors like this
Fatal error: Uncaught ArgumentCountError: Too few arguments to function simple_html_dom::find(), 0 passed in C:\xampp\htdocs\crawl\index.php on line 5 and at least 1 expected in C:\xampp\htdocs\crawl\simple_html_dom.php:1975 Stack trace: #0 C:\xampp\htdocs\crawl\index.php(5): simple_html_dom->find() #1 {main} thrown in C:\xampp\htdocs\crawl\simple_html_dom.php on line 1975
Upvotes: 1
Views: 2993
Reputation: 5471
Do you want to find all paragraphs/text that contains your given word?
<?php
include('simple_html_dom.php');
$html = file_get_html('https://adityadees.blogspot.com/');
$strings_array = array();
//it searches for any (*) tag with text yang in it
foreach($html->find('*[plaintext*=yang]') as $element) {
//take only elements which doesn't have childnodes, so are last ones in recursion
if ($element->firstChild() == null) {
//there still are duplicate strings so add only unique values to an array
if (!in_array($element->innertext, $strings_array)) {
$strings_array[] = $element->innertext;
}
}
}
echo '<pre>';
print_r($strings_array);
echo '</pre>';
?>
It isn't final solution, but something to start with. At least it finds word yang 61 times - same as in html source of given page.
Upvotes: 1
Reputation: 5471
Upon inspecting source of given page you can see that post summary is inside div tag with class = item-snippet.
<div class='item-snippet'> Bagaimana Cara Mengganti Akun Mobile Legend ? itulah yang selalu dipertanyakan oleh orang yang baru memulai bermain game Mobile Legend. S...</div>
You can get your result if you search for your word in such div's:
include('simple_html_dom.php');
$html = file_get_html('https://adityadees.blogspot.com/');
foreach($html->find('div[class=item-snippet]') as $element) {
if (strpos($element, 'yang') !== false) {
echo $element;
}
}
result:
Bagaimana Cara Mengganti Akun Mobile Legend ? itulah yang selalu dipertanyakan oleh orang yang baru memulai bermain game Mobile Legend. S...
Bagaimana Cara Mengaitkan Akun Mobile Legend di Patch Baru ? Mungkin masih ada yang bingung tentang cara mengaitkan akun mobile legend den...
Kali ini kita akan membahas tentang bagaimana cara menghitung luas persegi panjangan dengan PHP Hal yang pertama dilakukan adalah membuat ...
Is this you are looking for?
Upvotes: 0
Reputation: 1278
How about:
foreach($html->find('<body>') as $element)
if (strpos($element, 'yang') !== false) {
echo $element;
} else {
echo $element;
}
Upvotes: 0