Reputation: 926
I'm retrieving an XML document (in this case an RSS feed) from a remote source using the DOMDocument feature of PHP. It returns the XML as a DOM object and I can access content of the XML tags like this:
$url = $_POST['url']; // eg. http://example.com/page.xml
$xmlDoc = new DOMDocument();
$xmlDoc -> load($url);
$channel = $xmlDoc -> getElementsByTagName('channel') -> item(0);
This works fine for me, but I was wondering if there was a way I could check if the server serving the document is sending the correct content-type
header, which in this case should be text/xml
or application/xml
. How could I determine the content-type header being sent?
I guess something I'm trying to do is get one step closer to determining if the document is valid XML. I know that looking at the content-type header doesn't guarantee this, but I might rule out some errors if the wrong header is being sent.
Upvotes: 1
Views: 853
Reputation: 270609
This is one of those areas where PHP does some automagic behavior that's difficult to discover without many years of experience digging it out. Calling DOMDocument::load()
on a URL invokes PHP's http/https stream wrappers to load the URL. Doing so populates a special variable called $http_response_header
representing an array of headers from whatever the immediately preceding http/https stream call was.
So right after $xmlDoc->load($url)
, attempt to inspect $http_response_header
. Note that it is not an easily parsed associative array. Instead, you need to find the Content-Type:
string and split it on the colon :
.
$xmlDoc = new DOMDocument();
$xmlDoc->load($url);
// Loop over the array and look for the desired header
foreach ($http_response_header as $header) {
// Find the header with a case-insensitive search
// for Content-Type:
if (stripos($header, 'Content-Type:') === 0) {
// and split it on : to take the second value
// Example: "Content-Type: application/xml; charset=UTF-8"
$content_type = trim(explode(':', $header)[1]);
}
// You can break out of the loop after finding it
break;
}
A point of caution - if you are accepting a URL from a form $_POST
, you may wish to place some restrictions on what values are acceptable. You could be exposing yourself to some security issues by retrieving any arbitrary URL (denial of service attacks come to mind, possibly proxy abuse too)
// Careful not to accept just any url anyone sends...
$url = $_POST['url'];
Upvotes: 2