Peter Cullen
Peter Cullen

Reputation: 926

PHP get the content type header of DOMDocument loaded from url

I'm retrieving an XML document (in this case an RSS feed) from a remote source using the DOMDocument feature of PHP. It returns the XML as a DOM object and I can access content of the XML tags like this:

$url     =  $_POST['url']; // eg. http://example.com/page.xml
$xmlDoc  =  new DOMDocument();
$xmlDoc  -> load($url);
$channel =  $xmlDoc -> getElementsByTagName('channel') -> item(0);

This works fine for me, but I was wondering if there was a way I could check if the server serving the document is sending the correct content-type header, which in this case should be text/xml or application/xml. How could I determine the content-type header being sent?

I guess something I'm trying to do is get one step closer to determining if the document is valid XML. I know that looking at the content-type header doesn't guarantee this, but I might rule out some errors if the wrong header is being sent.

Upvotes: 1

Views: 853

Answers (1)

Michael Berkowski
Michael Berkowski

Reputation: 270609

This is one of those areas where PHP does some automagic behavior that's difficult to discover without many years of experience digging it out. Calling DOMDocument::load() on a URL invokes PHP's http/https stream wrappers to load the URL. Doing so populates a special variable called $http_response_header representing an array of headers from whatever the immediately preceding http/https stream call was.

So right after $xmlDoc->load($url), attempt to inspect $http_response_header. Note that it is not an easily parsed associative array. Instead, you need to find the Content-Type: string and split it on the colon :.

$xmlDoc = new DOMDocument();
$xmlDoc->load($url);

// Loop over the array and look for the desired header
foreach ($http_response_header as $header) {
  // Find the header with a case-insensitive search
  // for Content-Type: 
  if (stripos($header, 'Content-Type:') === 0) {
    // and split it on : to take the second value
    // Example: "Content-Type: application/xml; charset=UTF-8"
    $content_type = trim(explode(':', $header)[1]);
  }
  // You can break out of the loop after finding it
  break;
}

A point of caution - if you are accepting a URL from a form $_POST, you may wish to place some restrictions on what values are acceptable. You could be exposing yourself to some security issues by retrieving any arbitrary URL (denial of service attacks come to mind, possibly proxy abuse too)

// Careful not to accept just any url anyone sends...
$url = $_POST['url'];

Upvotes: 2

Related Questions