Reputation: 163
I need a regex that matches all the subdomains of a specific domain. The content media type is unknown (i.g. may be urlencoded, json etc)
So far I've got this simple regex
re, err := regexp.Compile(`([-a-z0-9])+[.](.*)domain.com`)
The issue is that for a string such this below the whole string is Found instead to get the specific sudomains extracted such prefix1-prefix2.domain.com
so the question is what regexp do I need to get only the subdomains without the extra string (i.e %2Fdocument...
)
prefix1-prefix2.domain.com%2Fdocument%2F%3Fusp%3Dmkt_docs%26lng%3Den&followup=https%3A%2F%2Fprefix1-prefix2.domain.com%2Fdocument%2F%3Fusp%3Dmkt_docs%26lng%
Upvotes: 1
Views: 3034
Reputation: 12875
I recommend use url package that can parse URL for you and then check domain. Otherwise you can get some pitfalls like symbols escaping etc.
https://play.golang.org/p/6so9FTfOSo
Upd: updated the recipe, also there’s an error in source URL - last %
is not followed by hex code.
package main
import (
"fmt"
"net/url"
"strings"
)
var s = "prefix1-prefix2.domain.com%2Fdocument%2F%3Fusp%3Dmkt_docs%26lng%3Den&followup=https%3A%2F%2Fprefix1-prefix2.domain.com%2Fdocument%2F%3Fusp%3Dmkt_docs%26lng"
func main() {
s2, err := url.QueryUnescape(s)
u, err := url.Parse(s2)
if err != nil {
fmt.Println("error")
}
if u.Scheme == "" {
s2 = "https://" + s2
}
fmt.Println(s2, err)
u, err = url.Parse(s2)
fmt.Printf("%#v %v", u, err)
fmt.Println(u.Host)
if strings.HasSuffix(u.Host, ".domain.com") {
fmt.Println("subdomain!")
}
}
Upvotes: 2
Reputation: 3760
The greedy (.*)
in your regex is matching all the stuff that you don't want, you can remove that. Also, I escaped your .
characters and moved your +
to be inside the capture group.
Change your regex to: ([-a-z0-9]+)\.domain\.com
or in the context of your code:
regexp.Compile(`([-a-z0-9]+)\.domain\.com`)
Here's a regex101 fiddle you can test it with.
Upvotes: 0