kfk
kfk

Reputation: 841

Clojure: chop of the first space separated characters

I want to parse and filter a file that looks like this:

@@1 Row one. 
@@2 Row two.

I have been able to do the filtering of the rows with the following code:

(defn parse-text-cms [sel-row]
  (let [f_data  (st/split  #"@@" (slurp "cms/tb_cms.txt"))] 
  ;(prn (map #(take 1 %) f_data))))
  (filter  #(= (first (take 1 %)) sel-row) f_data)))

However, this codes gives me (if sel-row=1):

1 Row one.

I would like to chop off that 1 and the space after, so to have:

Row one.

I think there is some sequence magic to do this. I just can't come up with an elegant solution.

Upvotes: 2

Views: 346

Answers (3)

bmillare
bmillare

Reputation: 4233

Another solution is to use a functional parser library such as dj-peg (which I wrote).

https://github.com/bmillare/dj-peg

Then you can write this:

 (require '[dj-peg :as p])
 (let [line "@@1 the remaining line\n"
       initial (p/token #"@@\d+\s+)]
       (second (p/parse initial line)))

The function parse uses the parser returned by p/token to parse the text in line. It will return a vector with the first value as the result of the parse, and the second is the remaining input. Therefore, if we call second, we get the rest of the line. Running this returns

 "the remaining line\n"

I recommend checking out the library. It is written in pseudo literate programming style so the source code reads quite smoothly. You should be able to understand the parsing model after going through the source code.

Upvotes: 1

Alex Stoddard
Alex Stoddard

Reputation: 8344

The previously given answer using line-seq and destructuring of a regex groups works well for the given use-case.

In a general situation where all you want is string manipulation clojure.core includes thesubs function. http://clojure.github.com/clojure/clojure.core-api.html#clojure.core/subs

subs is implemented using java interop and the substring method of the java String class.


user=> (subs "abcdef" 1)
"bcdef"
user=> (subs "abcdef" 2 4)
"cd"

Upvotes: 0

Christian Berg
Christian Berg

Reputation: 14496

I would define the function the following way:

(defn parse-text-cms [sel-row]
  (with-open [input (clojure.java.io/reader "cms/tb_cms.txt")]
    (first
     (for [[_ number line] (map (partial re-find #"@@(\d)+\s+(.*)")
                                (line-seq input))
           :when (= number (str sel-row))]
       line))))

The combination of line-seq and reader gives me a sequence of lines from the input file. with-open ensures that the file is properly closed when I'm done. I apply a regex to each line that looks for @@ followed by a number and some spaces.

re-find returns a vector with three items:

  • the whole matched line
  • the number (the first group in the regex)
  • the rest of the line (the second group in the regex)

I bind these to number and line using destructuring in a for statement (I'm not interested in the whole matched line, so I ignore that). I filter for the selected sel-row using :when and yield only the (rest of the) line.

Since I only expect one match in the file, I return just the first item from the sequence built by for. Because of the laziness of for, map and line-seq, this also stops reading of the file after the item is found.

If you do a lot of lookups for rows, I would suggest loading the whole file into memory instead of reading it every time, though.

Upvotes: 2

Related Questions