Reputation: 35
What's the best way to extract inner substrings from strings in Golang?
input:
"Hello <p> this is paragraph </p> this is junk <p> this is paragraph 2 </p> this is junk 2"
output:
"this is paragraph \n
this is paragraph 2"
Is there any string package/library for Go that already does something like this?
package main
import (
"fmt"
"strings"
)
func main() {
longString := "Hello world <p> this is paragraph </p> this is junk <p> this is paragraph 2 </p> this is junk 2"
newString := getInnerStrings("<p>", "</p>", longString)
fmt.Println(newString)
//output: this is paragraph \n
// this is paragraph 2
}
func getInnerStrings(start, end, str string) string {
//Brain Freeze
//Regex?
//Bytes Loop?
}
thanks
Upvotes: 2
Views: 7117
Reputation: 1714
Here is my function that I have been using it a lot.
func GetInnerSubstring(str string, prefix string, suffix string) string {
var beginIndex, endIndex int
beginIndex = strings.Index(str, prefix)
if beginIndex == -1 {
beginIndex = 0
endIndex = 0
} else if len(prefix) == 0 {
beginIndex = 0
endIndex = strings.Index(str, suffix)
if endIndex == -1 || len(suffix) == 0 {
endIndex = len(str)
}
} else {
beginIndex += len(prefix)
endIndex = strings.Index(str[beginIndex:], suffix)
if endIndex == -1 {
if strings.Index(str, suffix) < beginIndex {
endIndex = beginIndex
} else {
endIndex = len(str)
}
} else {
if len(suffix) == 0 {
endIndex = len(str)
} else {
endIndex += beginIndex
}
}
}
return str[beginIndex:endIndex]
}
You can try it at the playground, https://play.golang.org/p/Xo0SJu0Vq4.
Upvotes: 1
Reputation: 407
StrExtract Retrieves a string between two delimiters.
StrExtract(sExper, cAdelim, cCdelim, nOccur)
sExper: Specifies the expression to search. sAdelim: Specifies the character that delimits the beginning of sExper.
sCdelim: Specifies the character that delimits the end of sExper.
nOccur: Specifies at which occurrence of cAdelim in sExper to start the extraction.
package main
import (
"fmt"
"strings"
)
func main() {
s := "a11ba22ba333ba4444ba55555ba666666b"
fmt.Println("StrExtract1: ", StrExtract(s, "a", "b", 5))
}
func StrExtract(sExper, sAdelim, sCdelim string, nOccur int) string {
aExper := strings.Split(sExper, sAdelim)
if len(aExper) <= nOccur {
return ""
}
sMember := aExper[nOccur]
aExper = strings.Split(sMember, sCdelim)
if len(aExper) == 1 {
return ""
}
return aExper[0]
}
Upvotes: 0
Reputation: 24898
Don't use regular expressions to try and interpret HTML. Use a fully capable HTML tokenizer and parser.
I recommend you read this article on CodingHorror.
Upvotes: 6