Reputation: 1293
I believe the mark up of the page is part of the issue I am having, so I think I need to post the source and a JSFiddle JSFiddle and the orginal GIS page
I am trying to get info such as Name: and Address: from the table at the bottom.
attempt at a solution:
I wrote the following code hoping to see all the table data, yet the table I'm looking to get data from returns nothing.
<?php
$k=0;
$num=1000;
var_dump(libxml_use_internal_errors(true));
$domOb = new DOMDocument();
$html = @$domOb->loadHTMLFile('http://www.gis.catawba.nc.us/website/Parcel/parcel_main.asp?Cmd=query&key=372215634301&type=P');
$domOb->preserveWhiteSpace = false;
$items = $domOb->getElementsByTagName('td');
while ($k<(int)$num){
echo $items->item($k++)->nodeValue.'<br>';
};
?>
all that returned was:
bool(false) Real Estate Search - Legacy Map Layers visible FAQ's Help GIS Home
So I'm hoping someone can tell me what I'm doing wrong to miss all the data I'm looking for? How can I pull just the name and address as easily/simply as possible?
attempted the following as well using Xpath but get lots of warning...
$dom = new DOMDocument;
$dom->load('http://www.gis.catawba.nc.us/website/Parcel/parcel_main.asp?Cmd=query&key=372215634301&type=P');
$s = simplexml_import_dom($dom);
echo $name = $s->xpath('//table[@class="words13]/td[contains(text(), "Name:")]');
echo $add = $s->xpath('//table[@class="words13]/td[contains(text(), Address:)]');
Using the code by user2518542 and combined with hakre code i get the following
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"http://www.gis.catawba.nc.us/website/Parcel/parcel_main.asp?Cmd=QUERY&key=372215634301&type=P&width=1280&height=923");
curl_setopt($ch, CURLOPT_TIMEOUT, 30); //timeout after 30 seconds
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
$result=curl_exec ($ch);
curl_close ($ch);
$doc->loadHTML($result);
$tds = $doc->getElementsByTagname('td');
foreach($tds as $td) {
printf(" * %s\n", $td->textContent);
echo '<br>';
}
The following successfully prints out all the tags.
Upvotes: 0
Views: 447
Reputation: 198214
The table cells you are looking for are not part of that HTML document. You first of all need to understand the basics of webscraping, I suggest you borrow some books about the topic and read through them.
Time for the library ;)
In case the table cells are in the document (it seems to vary, sometimes they are, sometimes they are not), the original example shows it, this also demonstrates how to iterate over a DOMNodeList:
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTMLFile('Catawba County Legacy Map Server.html');
$tds = $doc->getElementsByTagname('td');
foreach($tds as $td) {
printf(" * %s\n", $td->textContent);
}
Exemplary output:
php "test.php" (in directory: /home/hakre/php/test)
*
* Real Estate Search - Legacy
*
*
*
*
*
*
*
*
*
* Map Layers
* visible
*
*
* Parcels
*
* Parcel Annotation
*
* Address Points
*
* Misc. Lines
*
* Structures
*
* Contour Lines
*
* Soils
*
* Townships
*
* Water Features
*
* Tiles
*
* Flood Zone
*
* Agricultural District
*
* Aerial 2009
*
* Aerial 2005
*
* Aerial 2002
*
* Cities
*
* Print the Map
* Print Map and Parcel Report
* Print the Parcel Report
* Assessment Report
* List all Owners
* Deed History Report
* Parcel Information:
* Owner Information:
* Parcel ID: 372215634301
* Name: PENLEY TREASURE B
* Parcel Address: 3152 7TH AV SE
* Name2:
* City: CONOVER 28613
* Address: 5508 SWINGING BRIDGE RD
* LRK(REID): 57186
* Address2:
* Deed Book/Page: 1906/0741 Deed Image
* City: CONOVER
* Subdivision: FOREST HGTS
* State/Zip: NC 28613-7415
* Lots: 1-4
*
* Block: C
*
* Last Sale:
* School Information:
* Plat Book/Page: 8/119 Plat Image
* School District: COUNTY
* Calculated Acreage: 0.31
* Elementary School: WEBB A MURRAY
* Tax Map: 167H 04006A
* Middle School: ARNDT
* State Road:
* High School: ST STEPHENS
* Township: HICKORY
* School Map
*
*
* Tax/Value Information: Tax Rates(pdf)
* Zoning Information:
* Municipal Tax District:
* Zoning District: HICKORY
* Fire District: HICKORY RURAL
* Zoning1: OI
* Tax Account Number:
* Zoning2:
* Market Building(s) Value: $55,400
* Zoning3:
* Market Land Value: $20,300
* Zoning Overlay:
* Market Total Value: $75,700
* Small Area:
* Year Built/Remodeled: 1959
* Split Zoning District 1/2: 0/0
* Current Tax Bill
* Zoning Agency Phone Numbers
* Miscellaneous:
*
* Voter Precinct:P35
* Firm Panel Date: 9/5/2007
* Building Permits for this parcel
* Firm Panel #: 3710372200J
* WaterShed:
* 2010 Census Tract: 011000
* WaterShed Split:
* 2010 Census Block: 3035
* Parcel Report Data Descriptions
* Agricultural District:
* FAQ's
* Help
* GIS Home
Compilation finished successfully.
Upvotes: 2
Reputation: 29
Try this
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"http://www.gis.catawba.nc.us/website/Parcel/parcel_main.asp? Cmd=QUERY&key=372215634301&type=P&width=1280&height=923");
curl_setopt($ch, CURLOPT_TIMEOUT, 30); //timeout after 30 seconds
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
$result=curl_exec ($ch);
curl_close ($ch);
echo $result;exit;
you will get full page source and then you can simply get watever you want through pregreplace.
Upvotes: 1