Multilingual sitemaps

To serve a multilingual sitemap, we need to create a Sitemap index file and list a Sitemap file for each language we support.

Sitemap index file

We place the page named sitemap.html in the root directory of the site. It points to the other localized sitemaps in the respective language subfolders.

---
layout: none

sitemap:
  excluded: true
---

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

  {%- assign pages = site.pages | where: 'language_reference', 'sitemap' %}

  {%- for page in pages %}
    <sitemap>
      <loc>{{ site.absoluteurl }}{{ page.url | remove: 'index.html' }}</loc>

      {%- if page.sitemap.lastmod %}
        {%- assign lastmod = page.sitemap.lastmod | date: '%Y-%m-%d' %}
      {%- elsif page.date %}
        {%- assign lastmod = page.date | date_to_xmlschema %}
      {%- else %}
        {%- assign lastmod = site.time | date_to_xmlschema %}
      {%- endif %}
      <lastmod>{{ lastmod }}</lastmod>
    </sitemap>
  {%- endfor %}

</sitemapindex>

By setting the following variables in the front matter of the Sitemap index file:

sitemap:
  excluded: true

we make sure to exclude it from the list of pages returned in each language Sitemap file.

Sitemap files

We then place a dedicated page named sitemap.xml in each of the language subdirectories of the site. For example, here is the front matter of the English page sitemap.xml:

---
layout: none

title: English Sitemap

language: en
language_reference: sitemap

sitemap:
  excluded: true
---

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

  {%- assign posts = site.posts | sort: 'date' | where: 'language', page.language | where: 'published', true %}

  {%- for post in posts reversed %}
    {%- unless post.sitemap.excluded == true %}
    <url>
      <loc>{{ site.absoluteurl }}{{ post.url }}</loc>

      {%- if post.sitemap.lastmod %}
        {%- assign lastmod = post.sitemap.lastmod | date: '%Y-%m-%d' %}
      {%- elsif post.date %}
        {%- assign lastmod = post.date | date_to_xmlschema %}
      {%- else %}
        {%- assign lastmod = site.time | date_to_xmlschema %}
      {%- endif %}
      <lastmod>{{ lastmod }}</lastmod>

      {%- if post.sitemap.changefreq %}
        {%- assign changefreq = post.sitemap.changefreq %}
      {%- else %}
        {%- assign changefreq = 'monthly' %}
      {%- endif %}
      <changefreq>{{ changefreq }}</changefreq>

      {%- if post.sitemap.priority %}
        {%- assign priority = post.sitemap.priority %}
      {%- else %}
        {%- assign priority = 0.5 %}
      {%- endif %}
      <priority>{{ priority }}</priority>
    </url>
    {%- endunless %}
  {%- endfor %}

  {%- assign pages = site.pages | where: 'language', page.language %}

  {%- for page in pages %}
    {%- unless page.sitemap.excluded == true %}
    <url>
      <loc>{{ site.absoluteurl }}{{ page.url | remove: 'index.html' }}</loc>

      {%- if post.sitemap.lastmod %}
        {%- assign lastmod = page.sitemap.lastmod | date: '%Y-%m-%d' %}
      {%- elsif post.date %}
        {%- assign lastmod = page.date | date_to_xmlschema %}
      {%- else %}
        {%- assign lastmod = site.time | date_to_xmlschema %}
      {%- endif %}
      <lastmod>{{ lastmod }}</lastmod>

      {%- if page.sitemap.changefreq %}
        {%- assign changefreq = page.sitemap.changefreq %}
      {%- else %}
        {%- assign changefreq = 'monthly' %}
      {%- endif %}
      <changefreq>{{ changefreq }}</changefreq>

      {%- if page.sitemap.priority %}
        {%- assign priority = page.sitemap.priority %}
      {%- else %}
        {%- assign priority = 0.3 %}
      {%- endif %}
      <priority>{{ priority }}</priority>
    </url>
    {%- endunless %}
  {%- endfor %}

</urlset>

Each page contains two for loops:

  • the first loop goes through the array of posts and returns the ones that do not have the variables sitemap: excluded: true set in their front matter
  • the second loop goes through the array of pages and, similarly, returns the ones that do not have the variables sitemap: excluded: true set in their front matter

We can override the lastmod, changefreq, and priority default values by setting the following variables in the front matter of the file:

sitemap:
  lastmod: 2021-08-15 08:00:00 +0300
  changefreq: monthly
  priority: 0.5

Again, we can exclude a post or page from being returned in a sitemap by setting the following variables in the front matter of the file:

sitemap:
  excluded: true

RSS feed

Coming soon…

Page not found

Coming soon…

Resources

Afterword

If you feel like adding something to the subject and/or you have spotted something worth fixing, please feel free to either drop me a line or create an issue on GitHub: thoughts, critiques, suggestions are welcomed.

Thank you!