Reputation: 391
I'm learning Golang so I can rewrite some of my shell scripts.
I have URL's that look like this:
https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value
I want to extract the following part:
https://example-1.example.com/a/c482dfad3573acff324c/list.txt
In a shell script I would do something like this:
echo "$myString" | grep -o 'http://.*.txt'
What is the best way to do the same thing in Golang, only by using the standard library?
Upvotes: 8
Views: 28911
Reputation: 1
I used regexp package extract string from string .
In this example I wanted to extract between and <\PERSON> , did this by re expression and and replaced and <\PERSON> by re1 expression.
for loop used for if there there are multiple match and re1 format used for replace.
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`<PERSON>(.*?)</PERSON>`)
string_l := "java -mx500m -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz -textFile PatrickYe.txt -outputFormat inlineXML 2> /dev/null I complained to <ORGANIZATION>Microsoft</ORGANIZATION> about <PERSON>Bill Gates</PERSON>.They told me to see the mayor of <PERSON>New York</PERSON>.,"
x := re.FindAllString(string_l, -1)
fmt.Println(x)
for v,st:= range x{
re1 := regexp.MustCompile(`<(.?)PERSON>`)
y1 := re1.ReplaceAllLiteralString(st,"")
fmt.Println(v,st," : sdf : ",y1)
}
}
Upvotes: 0
Reputation: 120941
There are a few options:
// match regexp as in question
pat := regexp.MustCompile(`https?://.*\.txt`)
s := pat.FindString(myString)
// everything before the query
s := strings.Split(myString, "?")[0] string
// same as previous, but avoids []string allocation
s := myString
if i := strings.IndexByte(s, '?'); i >= 0 {
s = s[:i]
}
// parse and clear query string
u, err := url.Parse(myString)
u.RawQuery = ""
s := u.String()
The last option is the best because it will handle all possible corner cases.
Upvotes: 13
Reputation: 1491
If you are prosessing only URLs, you can use Go's net/url
library https://golang.org/pkg/net/url/ to parse the URL, truncate the Query and Fragment parts (Query would be parm1=value,parm2=value
etc.), and extract the remaining portion scheme://host/path
, as in the following example (https://play.golang.org/p/Ao0jU22NyA):
package main
import (
"fmt"
"net/url"
)
func main() {
u, _ := url.Parse("https://example-1.example.com/a/b/c/list.txt?parm1=value,parm2=https%3A%2F%2Fexample.com%2Fa%3Fparm1%3Dvalue%2Cparm2%3Dvalue#somefragment")
u.RawQuery, u.Fragment = "", ""
fmt.Printf("%s\n", u)
}
Output:
https://example-1.example.com/a/b/c/list.txt
Upvotes: 2
Reputation:
you may use strings.IndexRune
, strings.IndexByte
, strings.Split
, strings.SplitAfter
, strings.FieldsFunc
, url.Parse
, regexp
or your function.
first most simple way:
you may use i := strings.IndexRune(s, '?')
or i := strings.IndexByte(s, '?')
then s[:i]
like this (with commented output):
package main
import "fmt"
import "strings"
func main() {
s := `https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value`
i := strings.IndexByte(s, '?')
if i != -1 {
fmt.Println(s[:i]) // https://example-1.example.com/a/c482dfad3573acff324c/list.txt
}
}
or you may use url.Parse(s)
(I'd use this):
package main
import "fmt"
import "net/url"
func main() {
s := `https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value`
url, err := url.Parse(s)
if err == nil {
url.RawQuery = ""
fmt.Println(url.String()) // https://example-1.example.com/a/c482dfad3573acff324c/list.txt
}
}
or you may use regexp.MustCompile(".*\\.txt")
:
package main
import "fmt"
import "regexp"
var rgx = regexp.MustCompile(`.*\.txt`)
func main() {
s := `https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value`
fmt.Println(rgx.FindString(s)) // https://example-1.example.com/a/c482dfad3573acff324c/list.txt
}
or you may use splits := strings.FieldsFunc(s, func(r rune) bool { return r == '?' })
then splits[0]
:
package main
import "fmt"
import "strings"
func main() {
s := `https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value`
splits := strings.FieldsFunc(s, func(r rune) bool { return r == '?' })
fmt.Println(splits[0]) // https://example-1.example.com/a/c482dfad3573acff324c/list.txt
}
you may use splits := strings.Split(s, "?")
then splits[0]
:
package main
import "fmt"
import "strings"
func main() {
s := `https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value`
splits := strings.Split(s, "?")
fmt.Println(splits[0]) // https://example-1.example.com/a/c482dfad3573acff324c/list.txt
}
you may use splits := strings.SplitAfter(s, ".txt")
then splits[0]
:
package main
import "fmt"
import "strings"
func main() {
s := `https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value`
splits := strings.SplitAfter(s, ".txt")
fmt.Println(splits[0]) // https://example-1.example.com/a/c482dfad3573acff324c/list.txt
}
or you may use your function (most independent way):
package main
import "fmt"
func left(s string) string {
for i, r := range s {
if r == '?' {
return s[:i]
}
}
return ""
}
func main() {
s := `https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value`
fmt.Println(left(s)) // https://example-1.example.com/a/c482dfad3573acff324c/list.txt
}
Upvotes: 4