Reputation: 18743
First of all have a look at here,
www.zedge.net/txts/4519/
this page has so many text messages , I want my script to open each of the message and download it, but i am having some problem,
This is my simple script to open the page,
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.zedge.net/txts/4519");
$contents = curl_exec ($ch);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_close ($ch);
?>
The page download fine but how would i open every text message page inside this page one by one and save its content in a text file, I know how to save the content of a webpage in a text file using curl but in this case there are so many different pages inside the page i've downloaded how to open them one by one seperately ?
I've this idea but don't know if it will work,
Downlaod this page,
www.zedge.net/txts/4519
look for the all the links of text messages page inside the page and save each link into one text file (one in each line), then run another curl session , open the text file read each link one by one , open it copy the content from the particular DIV and then save it in a new file.
Upvotes: 1
Views: 5659
Reputation: 181
I used DOM for my code part. I called my desire page and filtered data using getElementsByTagName('td')
Here i want the status of my relays from the device page. every time i want updated status of relays. for that i used below code.
$keywords = array();
$domain = array('http://USERNAME:PASSWORD@URL/index.htm');
$doc = new DOMDocument;
$doc->preserveWhiteSpace = FALSE;
foreach ($domain as $key => $value) {
@$doc->loadHTMLFile($value);
//$anchor_tags = $doc->getElementsByTagName('table');
//$anchor_tags = $doc->getElementsByTagName('tr');
$anchor_tags = $doc->getElementsByTagName('td');
foreach ($anchor_tags as $tag) {
$keywords[] = strtolower($tag->nodeValue);
//echo $keywords[0];
}
}
Then i get my desired relay name and status in $keywords[]
array.
Here i am sharing of Output.
If you want to read all messages in the main page. then first you have to collect all link for separate messages. Then you can use it for further same process.
Upvotes: 2
Reputation: 20997
The algorithm is pretty straight forward:
www.zedge.net/txts/4519
with curl
// Load main page
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, "http://www.zedge.net/txts/4519");
$contents = curl_exec ($ch);
$dom = new DOMDocument();
$dom->loadHTML( $contents);
// Filter all the links
$xPath = new DOMXPath( $dom);
$items = $xPath->query( '//a[class=myLink]');
foreach( $items as $link){
$url = $link->getAttribute('href');
if( strncmp( $url, 'http', 4) != 0){
// Prepend http:// or something
}
// Open sub request
curl_setopt($ch, CURLOPT_URL, "http://www.zedge.net/txts/4519");
$subContent = curl_exec( $ch);
}
See documentation and examples for xPath::query
, note that DOMNodeList
implements Traversable
and therefor you can use foreach
.
Tips:
COOKIE_JAR_FILE
sleep(...)
not to flood serverUpvotes: 3