While the limitations of using only standard characters in URLs may seem restrictive, it is important to understand that it is in place to ensure that the content is accessible to the widest possible audience.
While WordPress's default slug sanitization process helps to create clean and consistent URLs, it can also be limiting for users who want to include dots or other special characters in their permalinks. In the first place, in order to be able to describe how to fix the problem, we need to have a solid understanding of what the problem actually is.
Of course, this is done on purpose since various browsers handle non-standard (non-ASCII) characters in URLs differently. The goal was to standardize and maintain compatibility for users of older browsers like Netscape and Internet Explorer. They now make up a small percentage of online visitors, but the procedure for WordPress has not altered.
Why Latin characters are the standard for URLs?
URLs, or uniform resource locators, are the addresses that users enter into their web browsers to access specific pages on a website. These addresses are limited to a specific set of characters known as the ASCII character set, which includes the standard English alphabet, numbers, and a few special characters. Latin characters are a subset of the ASCII character set, and they make up the majority of the characters used in URLs.
The reason for this is that the original design of the internet and web browsers was primarily focused on English-language content, and the ASCII character set was chosen as the standard character set for URLs because it was widely supported and well-established. The ASCII character set was simple, widely available and can be easily processed by the early computer systems.
However, as the internet has grown and expanded to include content in a variety of languages and scripts, it has become increasingly important to be able to include non-ASCII characters in URLs. To accommodate this, non-ASCII characters are "percent-encoded" by replacing them with a percent sign (%) followed by their ASCII equivalent in hexadecimal notation. This allows non-ASCII characters to be included in URLs, while still adhering to the ASCII character set used in the standard.
Why WordPress sticks to Latin characters in permalinks?
In addition to compatibility concerns with older browsers, using non-standard characters in URLs can also cause issues with search engine optimization (SEO). Search engines may have difficulty properly indexing and ranking pages with non-standard characters in the URLs, which can negatively impact the visibility and reach of the content.
It's also worth noting that special characters can also cause problems when trying to share URLs or links. Some email clients and messaging platforms may not properly handle special characters in URLs, which can lead to broken links or other issues. By limiting the use of special characters in URLs, WordPress ensures that links are more likely to be shared and accessed correctly across all platforms.
While the limitations of using only standard characters in URLs may seem restrictive, it is important to understand that it is in place to ensure that the content is accessible to the widest possible audience. Website owners should consider this when designing their URLs, and consider whether the use of special characters is truly necessary, or if they can be replaced with dashes or other standard characters.
However, if you are looking to use special characters in your URL, there are a few ways to work around the limitations of WordPress's default slug sanitization process. One option is to use a plugin such as Permalink Manager that allows for the use of special characters in URLs.
How does WordPress remove special characters and sanitize slugs?
In WordPress, special characters and non-alphanumeric characters are automatically removed and sanitized from slugs (also known as permalinks) in order to create clean and consistent URLs. This process is known as slug sanitization. By default, WP removes the following characters from slugs:
- Spaces:
Replaced with a hyphen (-) - Non-alphanumeric characters:
Removed completely - Diacritics (accented characters):
Replaced with the corresponding non-accented character
For example, if you create a post with the title "My Special Post!", the slug would be automatically generated as "my-special-post". Slug sanitization helps to ensure that URLs are clean, consistent, and easy to read and understand. It also helps to prevent potential issues with URLs that contain special characters or non-alphanumeric characters.
As already mentioned, the non-standard characters are encoded with the "percent-encoding" method and substituted with the hexadecimal equivalent of their ASCII value. WordPress goes even farther by limiting you to ASCII letters, digits, and dashes. It is important to point out that in addition to the aforementioned, all capital characters are changed to their lowercase counterparts automatically.
By default it sanitizes the slugs (post names) with
sanitize_title()
function. The dots, accents and other non-standard character are automatically removed from native slugs, when the post is saved or updated.
How to allow additional special characters in WordPress URLs?
Permalink Manager, as previously indicated, allows for the use of additional special characters in permalinks. Simply enter Permalink Manager settings, then go to the "Advanced settings" tab and toggle the "Strip special characters from slugs" option.
You may change the sanitization settings to allow dots or underscores in custom permalinks that you wish to manually adjust, as well as to keep them in the new permalinks after the post or page is published. Make sure your "Slugs mode" option is set to "Use actual titles as slugs" instead of "Use native slugs" in order to take use of this feature. Check out the separate article for additional information on this function.
You may manually add dots or underscores to custom permalinks at any time, regardless of the "Slugs mode" you select, as long as the option "Strip special characters from slugs" is disabled. When both options are set, Permalink Manager will use the title for the new default permalink instead of the native slug (where the dots are replaced with hyphens).
How to keep the dots in old permalinks?
If you changed "Slugs mode" and would like to keep the dots inside the permalinks of previously added posts and terms, you will need to regenerate their custom permalinks.
How to keep accented letter in permalinks?
By default, all accented letters in custom permalinks will be replaced with their non-accented equivalent (eg. Å => A, Æ => AE, Ø => O, Ć => C). To deactivate remove_accents() function and keep letters with accents please disable it in Permalink Manager settings: