Vincent Musk
Vincent Musk

Reputation: 548

Do metadata have to be in the <head> on php pages or can you just write them over it?

Can I just write the metadata like this:

<meta name="robots" content="noindex">
    <?php
    
    echo "Hello World";

Or do you have to include it in the <head> because Google specifies this on their website?

<html><head>
<meta name="robots" content="noindex" />
</head>
</html>
 <?php
        
 echo "Hello World";

?>

Upvotes: 0

Views: 325

Answers (1)

IMSoP
IMSoP

Reputation: 97718

The first thing to understand here is that web browsers and crawlers don't see your PHP code. PHP is just the tool that you happen to be using to generate your HTML. So the answer to your question would be the same if you created that HTML in Notepad.

So this PHP:

<meta name="robots" content="noindex">
    <?php
    
    echo "Hello World";

Just produces this HTML:

<meta name="robots" content="noindex">
    Hello World

The next thing to understand is that this is invalid HTML. There is no version of the HTML specification that allows an HTML document to look like that. However, the WHATWG HTML Living Standard includes a standardised parsing process which defines what browsers and crawlers should do even if the document is invalid.

So let's trace how this will be interpreted:

  1. We start in the "initial" insertion mode
  2. That insertion mode doesn't have a matching rule for our <meta> tag, so we fall to "Anything else" which tells us to go to "before html" insertion mode and try again.
  3. That insertion mode also has no rule, and tells us to insert an implicit <html> and go to "before head" insertion mode.
  4. That insertion mode still has no matching rule, so tells us to insert an implicit <head> and go to "in head" insertion mode.
  5. That insertion mode finally includes a rule for "A start tag whose tag name is 'meta'". This is where we'd expect meta tags, so it processes as normal.
  6. We then encounter the "Hello world" text while still in the "in head" insertion mode, which falls into "anything else", closes the <head>, and goes to "after head" insertion mode.
  7. That insertion mode can't handle text either, so adds an implicit <body> and tries "in body" insertion mode.
  8. That insertion mode finally knows how to deal with the text and inserts it.
  9. The parser reaches the end of the file and stops parsing. This implicitly closes any open tags.

So the result will be equivalent to if the HTML looked like this (adding some whitespace for readability):

<html>
<head>
<meta name="robots" content="noindex">
</head>
<body>
    Hello World
</body>
</html>

Since that's where the meta tag is expected, anything following the current spec should treat it the same as if you'd put it in the <head> properly.


Your second example, ignoring the PHP, is this HTML:

<html><head>
<meta name="robots" content="noindex" />
</head>
</html>
 Hello World

Note that this is also invalid: the text content is outside the HTML. This puts the parser in the "after after body" insertion mode (no, really!) which in this case ends up assuming the text is in the body, so makes no difference.


We could look at a different scenario, though, where there's text before the meta tag:

Hello World
<meta name="robots" content="noindex">
Goodbye World

This will parse differently...

  1. "initial" insertion mode says go to "before html" insertion mode
  2. "before html" insertion mode says insert <html> then go to "before head" insertion mode
  3. "before head" insertion mode says insert <head> then go to "in head" insertion mode
  4. This time, "in head" insertion mode doesn't have a matching rule, so closes the <head> and goes to "after head" insertion mode
  5. "after head" insertion mode inserts a <body> and goes to "in body" insertion mode
  6. Finally, we add the "Hello World" and move onto the <meta> tag.
  7. The rules for this say "Process the token using the rules for the 'in head' insertion mode." Those rules allow the browser to read out the character encoding, but otherwise just say to insert the node and carry on.
  8. Finally, still in "in body" insertion mode, we meet the "Goodbye World", and output it.
  9. As before, all open tags are implicitly closed at the end of the file.

So the result in this case is this:

<html>
<head></head>
<body>
Hello World
<meta name="robots" content="noindex">
Goodbye World
</body>
</html>

So, our <meta> tag is now actually considered to be inside the body not the head of the document. Exactly what happens next isn't defined by HTML, because it doesn't define the "robots" meta tag specifically. The ultimate answer will be it depends: both on what metadata you're defining, and on who is extracting it. If the crawler assumes that your meta tag will be in the head, it simply won't notice this one in the body.


The bottom line is that if you just stick to valid HTML, you don't need to worry about any of this!

Upvotes: 3

Related Questions