onlineracoon
onlineracoon

Reputation: 2970

SEO sitemap.xml dynamic content

Lets say we got these pages:

1. http://www.mywebsite.com/users/thomas-roberts
2. http://www.mywebsite.com/pages/thomas-roberts/1
3. http://www.mywebsite.com/pages/thomas-roberts/hello-kitty-collection

Is there a possibility to do this in a sitemap.xml:

<?xml version="1.0" encoding="utf-8"?>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
        <loc>http://mywebsite.com/users/^(\w+)$/</loc>
        <lastmod>2006-11-18</lastmod>
        <changefreq>daily</changefreq>
        <priority>1</priority>
    </url>
    <url>
        <loc>http://mywebsite.com/users/^(\w+)$/pages/^(\w+)$</loc>
        <lastmod>2006-11-18</lastmod>
        <changefreq>daily</changefreq>
        <priority>0.8</priority>
    </url>
    <url>
        <loc>http://mywebsite.com/users/^(\w+)$/pages/^(\d+)$</loc>
        <lastmod>2006-11-18</lastmod>
        <changefreq>daily</changefreq>
        <priority>0.6</priority>
    </url>
</urlset>

I hope my example is clear, we don't really specify a new "url" element in the sitemap.xml file, but instead we match a regex to the url, and we just come back everytime to update.

If this might not be a solution, how do Twitter and Facebook index all their pages (profile pages etc.) in Google? Do they generate a new sitemap everytime a new user is created, and update their sitemap evertime someone updates their page / profile?

I was very currious, if indead we got to somehow generate the sitemap.xml (which has a limit of 50.000 items and 10mb) what would be a good idea to generate sitemaps if content gets modified?

Thanks alot.

Upvotes: 2

Views: 2236

Answers (3)

eQ19
eQ19

Reputation: 10701

I think the best idea is update the url in a database (or a cache) using a script that is running by cron job. If the sitemap.xml can be generated within server time limit then let it run on the fly using the data. See here for an example: https://stackoverflow.com/a/29468042/4058484

However if you have a huge amount of data then the best is located the urls in multiple sitemap which is allowed as long as the are listed sitemap.xml specified in robots.txt see details here: http://www.sitemaps.org/protocol.html#sitemapIndexXMLExample.

Upvotes: 0

Unfortunately sitemaps files require explicit URLs in them. Robots.txt file instead admit certain kind of Wildcard Syntax through * and + signs to represent a set of URLs, but that's not the case for sitemaps files.

Upvotes: 0

John Conde
John Conde

Reputation: 219804

The sitemap must contain actually URLs. Regex are not acceptable and quite useless as they don't tell the search engines anything.

Sitemaps just tell search engines where to find your content. So if a page's content is modified the sitemap really won't affect it as far as search engines are concerned.

Upvotes: 3

Related Questions