Reputation: 841
I want to parse and filter a file that looks like this:
@@1 Row one.
@@2 Row two.
I have been able to do the filtering of the rows with the following code:
(defn parse-text-cms [sel-row]
(let [f_data (st/split #"@@" (slurp "cms/tb_cms.txt"))]
;(prn (map #(take 1 %) f_data))))
(filter #(= (first (take 1 %)) sel-row) f_data)))
However, this codes gives me (if sel-row=1):
1 Row one.
I would like to chop off that 1 and the space after, so to have:
Row one.
I think there is some sequence magic to do this. I just can't come up with an elegant solution.
Upvotes: 2
Views: 346
Reputation: 4233
Another solution is to use a functional parser library such as dj-peg (which I wrote).
https://github.com/bmillare/dj-peg
Then you can write this:
(require '[dj-peg :as p])
(let [line "@@1 the remaining line\n"
initial (p/token #"@@\d+\s+)]
(second (p/parse initial line)))
The function parse uses the parser returned by p/token to parse the text in line. It will return a vector with the first value as the result of the parse, and the second is the remaining input. Therefore, if we call second, we get the rest of the line. Running this returns
"the remaining line\n"
I recommend checking out the library. It is written in pseudo literate programming style so the source code reads quite smoothly. You should be able to understand the parsing model after going through the source code.
Upvotes: 1
Reputation: 8344
The previously given answer using line-seq
and destructuring of a regex groups works well for the given use-case.
In a general situation where all you want is string manipulation clojure.core includes thesubs
function. http://clojure.github.com/clojure/clojure.core-api.html#clojure.core/subs
subs
is implemented using java interop and the substring method of the java String class.
user=> (subs "abcdef" 1)
"bcdef"
user=> (subs "abcdef" 2 4)
"cd"
Upvotes: 0
Reputation: 14496
I would define the function the following way:
(defn parse-text-cms [sel-row]
(with-open [input (clojure.java.io/reader "cms/tb_cms.txt")]
(first
(for [[_ number line] (map (partial re-find #"@@(\d)+\s+(.*)")
(line-seq input))
:when (= number (str sel-row))]
line))))
The combination of line-seq
and reader
gives me a sequence of lines from the input file. with-open
ensures that the file is properly closed when I'm done. I apply a regex to each line that looks for @@
followed by a number and some spaces.
re-find
returns a vector with three items:
I bind these to number
and line
using destructuring in a for
statement (I'm not interested in the whole matched line, so I ignore that). I filter for the selected sel-row
using :when
and yield only the (rest of the) line
.
Since I only expect one match in the file, I return just the first item from the sequence built by for
. Because of the laziness of for
, map
and line-seq
, this also stops reading of the file after the item is found.
If you do a lot of lookups for rows, I would suggest loading the whole file into memory instead of reading it every time, though.
Upvotes: 2