David Riccitelli
David Riccitelli

Reputation: 7812

MediaWiki/Wikipedia URL sanitization regex

When you create a page in MediaWiki/Wikipedia, the title is sanitized and used as part of the URL path. E.g. 'Lorem Ipsum' becomes 'Lorem_Ipsum'.

Do you known which regex is used for the sanitization? I can see it accepts also extended characters (like ü).

Upvotes: 0

Views: 100

Answers (1)

leo
leo

Reputation: 8520

It depends a bit on the settings of your wiki, but basically:

  • Space is replaced with _ (they are treated as equal in the MediaWiki universe)
  • Non ascii characters are escaped
  • First character is made uppercase (this can be overridden)
  • Forward slashes can be considered separators for page / subpage, depending on per namespace settings.

There are a few restrictions as well, e.g. titles cannot begin with a colon. See https://www.mediawiki.org/wiki/Manual:Page_title for the full list.

Upvotes: 1

Related Questions