Reputation: 2970
Lets say we got these pages:
1. http://www.mywebsite.com/users/thomas-roberts
2. http://www.mywebsite.com/pages/thomas-roberts/1
3. http://www.mywebsite.com/pages/thomas-roberts/hello-kitty-collection
Is there a possibility to do this in a sitemap.xml:
<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://mywebsite.com/users/^(\w+)$/</loc>
<lastmod>2006-11-18</lastmod>
<changefreq>daily</changefreq>
<priority>1</priority>
</url>
<url>
<loc>http://mywebsite.com/users/^(\w+)$/pages/^(\w+)$</loc>
<lastmod>2006-11-18</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://mywebsite.com/users/^(\w+)$/pages/^(\d+)$</loc>
<lastmod>2006-11-18</lastmod>
<changefreq>daily</changefreq>
<priority>0.6</priority>
</url>
</urlset>
I hope my example is clear, we don't really specify a new "url" element in the sitemap.xml file, but instead we match a regex to the url, and we just come back everytime to update.
If this might not be a solution, how do Twitter and Facebook index all their pages (profile pages etc.) in Google? Do they generate a new sitemap everytime a new user is created, and update their sitemap evertime someone updates their page / profile?
I was very currious, if indead we got to somehow generate the sitemap.xml (which has a limit of 50.000 items and 10mb) what would be a good idea to generate sitemaps if content gets modified?
Thanks alot.
Upvotes: 2
Views: 2236
Reputation: 10701
I think the best idea is update the url in a database (or a cache) using a script that is running by cron job. If the sitemap.xml can be generated within server time limit then let it run on the fly using the data. See here for an example: https://stackoverflow.com/a/29468042/4058484
However if you have a huge amount of data then the best is located the urls in multiple sitemap which is allowed as long as the are listed sitemap.xml specified in robots.txt see details here: http://www.sitemaps.org/protocol.html#sitemapIndexXMLExample.
Upvotes: 0
Reputation: 594
Unfortunately sitemaps files require explicit URLs in them. Robots.txt file instead admit certain kind of Wildcard Syntax through * and + signs to represent a set of URLs, but that's not the case for sitemaps files.
Upvotes: 0
Reputation: 219804
The sitemap must contain actually URLs. Regex are not acceptable and quite useless as they don't tell the search engines anything.
Sitemaps just tell search engines where to find your content. So if a page's content is modified the sitemap really won't affect it as far as search engines are concerned.
Upvotes: 3