mismaah
mismaah

Reputation: 356

Which characters should be omitted from a URL so that it doesn't get encoded? (i.e. more human readable)

I initially had a function with a regex which removed all non alphanumeric characters and _ and that worked fine, as intended.

func titleToURL(title string) string {
    threadURL := strings.Replace(title, " ", "_", -1)
    pattern := regexp.MustCompile(`[^a-zA-Z\d_]`)
    threadURL = pattern.ReplaceAllString(threadURL, "")
    if len(threadURL) > urlMax {
        threadURL = threadURL[0:urlMax]
    }
    return threadURL
}

But, I later wanted the URL to have utf-8 characters like 香港, etc. So I removed the regex replace so that such characters would be allowed in. But after that, when a character like [ or ] gets passed, the URL shows all these % symbols and whatnot. I want these URLs to be human readable since they are not for passing data.

In short, I want URLs to look like text_test_url or chinese_香港 and not %3F_%5BM%5D

Which characters should I remove to achieve this? Provide a regex if possible.

Upvotes: 0

Views: 79

Answers (1)

DylanSp
DylanSp

Reputation: 1519

MDN lists the characters that get percent-encoded:

  • :
  • /
  • ?
  • #
  • [
  • ]
  • @
  • !
  • $
  • &
  • `
  • (
  • )
  • *
  • +
  • ,
  • ;
  • =
  • %

So your regex would just check for those characters (escaping them as necessary).

Upvotes: 2

Related Questions