Jim
Jim

Reputation: 1735

How to use Jekyll site.pages to generate sitemap.xml

I'm trying to use site.pages to automatically generate sitemap.xml in Jekyll (GitHub Pages), This is the sitemap.xml code I got:

---
---
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    {% for page in site.pages %}
    <url>
        <loc>https://example.com{{ page.url | remove: 'index.html' }}</loc>
    </url>
    {% endfor %}
</urlset>

It's output is something similar to this:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/samplepage.html</loc>
    <!--<loc>https://example.com/samplepage</loc>-->
  </url>
</urlset>

My goal is to generate a sitemap.xml without the trailing .html as in the commented line. I've tried gsub (I assumed Jekyll takes Ruby syntax: Replace words in string - ruby) but it seems either doesn't change anything or remove page.url completely.

I'd appreciate if anyone can

  1. modify the Jekyll syntax so that it generates URLs without the trailing .html.
  2. explain the syntax of | remove: 'index.html' (which removes the URL https://example.com/index.html from the generated sitemap.xml).

I'm very unfamiliar with Jekyll so apologies if the question seems trivial.

Upvotes: 5

Views: 2265

Answers (2)

David Jacquel
David Jacquel

Reputation: 52799

Any file in Jekyll folder is generated with his extension, except if you use permalink.

If you create a sitemap.xml file like this :

---
layout: null
---
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    {% for page in site.pages %}
    <url>
        <loc>https://example.com{{ page.url | remove: 'index.html' }}</loc>
    </url>
    {% endfor %}
</urlset>

It will be generated as sitemap.xml.

You can also use jekyll-sitemap wich is supported by github pages.

Upvotes: 2

approxiblue
approxiblue

Reputation: 7122

Jekyll uses Liquid to process templates. The pipe syntax is a Liquid filter:

Filters are simple methods. The first parameter is always the output of the left side of the filter. The return value of the filter will be the new left value when the next filter is run. When there are no more filters, the template will receive the resulting string.

remove is one of the standard Liquid filters, so the Jekyll documentation does not list it.


If you have this file in your root folder, the page URL is straightforward:

samplepage.html    # https://example.com/samplepage.html

Instead, if you have:

samplepage/
    index.html     # https://example.com/samplepage/index.html
                   # https://example.com/samplepage/

The page URL ends up being the folder name, and the server will automatically serve the index.html file inside if you use the second link.

site.pages will give you the first link. As you have a filter that removes index.html from your paths, you end up with extension-free URLs.

Upvotes: 2

Related Questions