Books
Books

Reputation: 163

regex match subdomain

I need a regex that matches all the subdomains of a specific domain. The content media type is unknown (i.g. may be urlencoded, json etc) So far I've got this simple regex re, err := regexp.Compile(`([-a-z0-9])+[.](.*)domain.com`)

The issue is that for a string such this below the whole string is Found instead to get the specific sudomains extracted such prefix1-prefix2.domain.com so the question is what regexp do I need to get only the subdomains without the extra string (i.e %2Fdocument...)

prefix1-prefix2.domain.com%2Fdocument%2F%3Fusp%3Dmkt_docs%26lng%3Den&followup=https%3A%2F%2Fprefix1-prefix2.domain.com%2Fdocument%2F%3Fusp%3Dmkt_docs%26lng%

Upvotes: 1

Views: 3034

Answers (2)

Eugene Lisitsky
Eugene Lisitsky

Reputation: 12875

I recommend use url package that can parse URL for you and then check domain. Otherwise you can get some pitfalls like symbols escaping etc.

https://play.golang.org/p/6so9FTfOSo

Upd: updated the recipe, also there’s an error in source URL - last % is not followed by hex code.

package main

import (
    "fmt"
    "net/url"
    "strings"
)

var s = "prefix1-prefix2.domain.com%2Fdocument%2F%3Fusp%3Dmkt_docs%26lng%3Den&followup=https%3A%2F%2Fprefix1-prefix2.domain.com%2Fdocument%2F%3Fusp%3Dmkt_docs%26lng"

func main() {
    s2, err := url.QueryUnescape(s)
    u, err := url.Parse(s2)
    if err != nil {
        fmt.Println("error")
    }
    if u.Scheme == "" {
        s2 = "https://" + s2
    }
    fmt.Println(s2, err)
    u, err = url.Parse(s2)

    fmt.Printf("%#v %v", u, err)

    fmt.Println(u.Host)
    if strings.HasSuffix(u.Host, ".domain.com") {
        fmt.Println("subdomain!")
    }
}

Upvotes: 2

Matt
Matt

Reputation: 3760

The greedy (.*) in your regex is matching all the stuff that you don't want, you can remove that. Also, I escaped your . characters and moved your + to be inside the capture group.

Change your regex to: ([-a-z0-9]+)\.domain\.com or in the context of your code:

regexp.Compile(`([-a-z0-9]+)\.domain\.com`)

Here's a regex101 fiddle you can test it with.

Upvotes: 0

Related Questions