Reputation: 3812
I'm trying to read a file that (may or may not) have YAML frontmatter line-by-line using Clojure, and return a hashmap with two vectors, one containing the frontmatter lines and one containing everything else (i.e., the body).
And example input file would look like this:
---
key1: value1
key2: value2
---
Body text paragraph 1
Body text paragraph 2
Body text paragraph 3
I have functioning code that does this, but to my (admittedly inexperienced with Clojure) nose, it reeks of code smell.
(defn process-file [f]
(with-open [rdr (java.io.BufferedReader. (java.io.FileReader. f))]
(loop [lines (line-seq rdr) in-fm 0 frontmatter [] body []]
(if-not (empty? lines)
(let [line (string/trim (first lines))]
(cond
(zero? (count line))
(recur (rest lines) in-fm frontmatter body)
(and (< in-fm 2) (= line "---"))
(recur (rest lines) (inc in-fm) frontmatter body)
(= in-fm 1)
(recur (rest lines) in-fm (conj frontmatter line) body)
:else
(recur (rest lines) in-fm frontmatter (conj body line))))
(hash-map :frontmatter frontmatter :body body)))))
Can someone point me to a more elegant way to do this? I'm going to be doing a decent amount of line-by-line parsing in this project, and I'd like a more idiomatic way of going about it if possible.
Upvotes: 3
Views: 1095
Reputation: 883
actually, the idiomatic way to do it using clojure would be to avoid returning 'a hashmap with two vectors' and treat the file as a (lazy) sequence of lines
then, the function that will process the sequence of lines decides whether the file has a YAML frontmatter or not
something like this:
(use '[clojure.java.io :only (reader)])
(let [s (line-seq (reader "YOURFILENAMEHERE"))]
(if (= "---\n" (take 1 (line-seq (reader "YOURFILENAMEHERE"))))
(process-seq-with-frontmatter s)
(process-seq-without-frontmatter s))
by the way, this is a quit and dirty solution; two things to improve:
Upvotes: 0
Reputation: 84369
Firstly, I'd put line-processing logic in its own function to be called from a function actually reading in the files. Better yet, you can make the function dealing with IO take a function to map over the lines as an argument, perhaps along these lines:
(require '[clojure.java.io :as io])
(defn process-file-with [f filename]
(with-open [rdr (io/reader (io/file filename))]
(f (line-seq rdr))))
Note that this arrangement makes it the duty of f
to realize as much of the line seq as it needs before it returns (because afterwards with-open
will close the underlying reader of the line seq).
Given this division of responsibilities, the line processing function might look like this, assuming the first ---
must be the first non-blank line and all blank lines are to be skipped (as they would be when using the code from the question text):
(require '[clojure.string :as string])
(defn process-lines [lines]
(let [ls (->> lines
(map string/trim)
(remove string/blank?))]
(if (= (first ls) "---")
(let [[front sep-and-body] (split-with #(not= "---" %) (next ls))]
{:front (vec front) :body (vec (next sep-and-body))})
{:body (vec ls)})))
Note the calls to vec
which cause all the lines to be read in and returned in a vector or pair of vectors (so that we can use process-lines
with process-file-with
without the reader being closed too soon).
Because reading lines from an actual file on disk is now decoupled from processing a seq of lines, we can easily test the latter part of the process at the REPL (and of course this can be made into a unit test):
;; could input this as a single string and split, of course
(def test-lines
["---"
"key1: value1"
"key2: value2"
"---"
""
"Body text paragraph 1"
""
"Body text paragraph 2"
""
"Body text paragraph 3"])
Calling our function now:
user> (process-lines test-lines)
{:front ("key1: value1" "key2: value2"),
:body ("Body text paragraph 1"
"Body text paragraph 2"
"Body text paragraph 3")}
Upvotes: 6