Reputation: 7385
I am just beginning to learn Go, and have made a function which parses markdown files with a header, containing some metadata (the files are blog posts).
here is an example:
---
Some title goes here
19 September 2012
---
This is some content, read it.
I've written this function, which works, but I feel it's quite verbose and messy, I've had a look at the various strings packages, but I don't know enough about Go and it's best practices to know what I should be doing differently, if I could get some tips to clean this up, I would appreciate it. (also, I know that i shouldn't be neglecting that error).
type Post struct {
Title string
Date string
Body string
}
func loadPost(title string) *Post {
filename := title + ".md"
file, _ := ioutil.ReadFile("posts/" + filename)
fileString := string(file)
str := strings.Split(fileString, "---")
meta := strings.Split(str[1], "\n")
title = meta[1]
date := meta[2]
body := str[2]
return &Post{Title: title, Date: date, Body: body}
}
Upvotes: 2
Views: 206
Reputation: 10857
Sonia has some great suggestions. Below is my take which accounts for problems you might encounter when parsing the header.
http://play.golang.org/p/w-XYyhPj9n
package main
import (
"fmt"
"strings"
)
const sep = "---"
type parseError struct {
msg string
}
func (e *parseError) Error() string {
return e.msg
}
func parse(s string) (header []string, content string, err error) {
if !strings.HasPrefix(s, sep) {
return header, content, &parseError{"content does not start with `---`!"}
}
arr := strings.SplitN(s, sep, 3)
if len(arr) < 3 {
return header, content, &parseError{"header was not terminated with `---`!"}
}
header = strings.Split(strings.TrimSpace(arr[1]), "\n")
content = strings.TrimSpace(arr[2])
return header, content, nil
}
func main() {
//
f := `---
Some title goes here
19 September 2012
---
This is some content, read it. --Anonymous`
header, content, err := parse(f)
if err != nil {
panic(err)
}
for i, val := range header {
fmt.Println(i, val)
}
fmt.Println("---")
fmt.Println(content)
//
f = `---
Some title goes here
19 September 2012
This is some content, read it.`
_, _, err = parse(f)
fmt.Println("Error:", err)
//
f = `
Some title goes here
19 September 2012
---
This is some content, read it.`
_, _, err = parse(f)
fmt.Println("Error:", err)
}
Upvotes: 1
Reputation: 28355
I agree that it's not bad. I'll add a couple of other ideas.
As Thomas showed, you don't need the intermediate variables title date and body. Try though,
return &Post{
Title: meta[1],
Date: meta[2],
Body: body,
}
It's true that you can leave the field names out, but I sometimes like them to keep the code self-documenting. (I think go vet likes them too.)
I fuss over strings versus byte slices, but probably more than I should. Since you're reading the file in one gulp, you probably don't need to worry about this. Converting everything to one big string and then slicing up the string is a handy way of doing things, just remember that you're pinning the entire string in memory if you keep any part of it. If your files are large or you have lots of them and you only end up keeping, say, the meta for most of them, this might not be the way to go.
There's just one blog entry per file? If so, I think I'll propose a variant of Thomas's suggestion. Verify the first bytes are --- (or your file is corrupt), then use strings.Index(fileString[3:], "---"). Split is more appropriate when you have an unknown number of segments. In your case you're just looking for that single separator after the meta. Index will find it after searching the meta and be done, without searching through the whole body. (And anyway, what if the body contained the string "---"?)
Finally, some people would use regular expressions for this. I still haven't warmed up to regular expressions, but anyway, it's another approach.
Upvotes: 2
Reputation: 4115
I think it's not bad. A couple of suggestions:
&Post{title, date, body}
Alternatively, you could find out where the body starts with LastIndex(s, "--")
and use that to index the file contents accordingly. This avoids the allocation of using Split.
const sep = "--"
func loadPost(content string) *Post {
sepLength := len(sep)
i := strings.LastIndex(content, sep)
headers := content[sepLength:i]
body := content[i+sepLength+1:]
meta := strings.Split(headers, "\n")
return &Post{meta[1], meta[2], body}
}
Upvotes: 4