justinrixx
justinrixx

Reputation: 801

Read a file into a list of pairs in elisp

I am trying to write an elisp function to read each word in a file into a pair. I want the first item of the pair to be the string sorted lexicographically, and the second item to be untouched.

Given the example file:

cat
cow
dog

I want the list to look like:

(act cat)
(cow cow)
(dgo dog)

My best crack at it is:

(defun get-file (filename)
  (with-open-file (stream filename)
    (loop for word = (read-line stream nil)
          while word
          collect ((sort word #'char-lessp) word))))

It compiles correctly in Emacs lisp interaction mode. However, when I try to run it by executing

(get-file "~/test.txt")

I end up in the Emacs debugger, and it's not telling me anything useful . . .

Debugger entered--Lisp error: (void-function get-file)
  (get-file "~/test.txt")
  eval((get-file "~/test.txt") nil)
  eval-last-sexp-1(t)
  eval-last-sexp(t)
  eval-print-last-sexp(nil)
  call-interactively(eval-print-last-sexp nil nil)
  command-execute(eval-print-last-sexp)

I am a lisp beginner, and have no idea what is wrong.

Thanks,

Justin

Upvotes: 4

Views: 2021

Answers (2)

Mirzhan Irkegulov
Mirzhan Irkegulov

Reputation: 18055

Vanilla Emacs

First, let's use Emacs's built-in functions only. There's no built-in function to sort strings in Emacs, so you first should convert a string to a list, sort, then convert the sorted list back to a string. This is how you convert a string to a list:

(append "cat" nil) ; => (99 97 116)

A string converted to a list becomes a list of characters, and characters are represented as numbers in Elisp. Then you sort the list and convert it to a string:

(concat (sort (append "cat" nil) '<)) ; => "act"

There's no built-in function to load file contents directly into a variable, but you can load them into a temporary buffer. Then you can return the entire temporary buffer as a string:

(with-temp-buffer
  (insert-file-contents-literally "file.txt")
  (buffer-substring-no-properties (point-min) (point-max))

This will return the string "cat\ncow\ndog\n", so you'll need to split it:

(split-string "cat\ncow\ndog\n") ; => ("cat" "cow" "dog")

Now you need to traverse this list and convert each item into a pair of sorted item and original item:

(mapcar (lambda (animal)
          (list (concat (sort (append animal nil) '<)) animal))
        '("cat" "cow" "dog"))
;; returns
;; (("act" "cat")
;;  ("cow" "cow")
;;  ("dgo" "dog"))

Full code:

(mapcar
 (lambda (animal)
   (list (concat (sort (append animal nil) '<)) animal))
 (split-string
  (with-temp-buffer
    (insert-file-contents-literally "file.txt")
    (buffer-substring-no-properties (point-min) (point-max)))))

Common Lisp Emulation

One of the Emacs built-in packages is cl.el, and there's no reason not to use it in your code. Therefore I lied, when I said there is no built-in functions to sort strings and the above is the only way to do the task using built-in functions. So let's use cl.el.

cl-sort a string (or any sequence):

(cl-sort "cat" '<) ; => "act"

cl-mapcar is more versatile than Emacs's built-in mapcar, but here you can use either of them.

There is a problem with cl-sort, it is destructive, meaning it modifies the argument in-place. We use local variable animal inside the anonymous function twice, and we don't want to garble the original animal. Therefore we should pass a copy of a sequence into it:

(lambda (animal)
  (list (cl-sort (copy-sequence animal) '<) animal))

The resulting code becomes:

(cl-mapcar
 (lambda (animal)
   (list (cl-sort (copy-sequence animal) '<) animal))
 (split-string
  (with-temp-buffer
    (insert-file-contents-literally "file.txt")
    (buffer-substring-no-properties (point-min) (point-max)))))

seq.el

In Emacs 25 a new sequence manipulation library was added, seq.el. Alternative to mapcar is seq-map, alternative to CL's cl-sort is seq-sort. The full code becomes:

(seq-map
 (lambda (animal)
   (list (seq-sort animal '<) animal))
 (split-string
  (with-temp-buffer
    (insert-file-contents-literally "file.txt")
    (buffer-substring-no-properties (point-min) (point-max)))))

dash, s, f

Usually the best solution to work with sequences and files is to reach directly for these 3 third-party libraries:

  • dash for list manipulation
  • s for string manipulation
  • f for file manipulation.

Their Github pages explain how to install them (installation is very simple). However for this particular problem they are a bit suboptimal. For example, -sort from dash only sorts lists, so we would have to get back to our string->list->string conversion:

(concat (-sort '< (append "cat" nil))) ; => "act"

s-lines from s leaves empty strings in files. On GNU/Linux text files usually end with newline at the end, so splitting your file would look like:

(s-lines "cat\ncow\ndog\n") ; => ("cat" "cow" "dog" "")

s-split supports an optional argument to omit empty lines, but it's separator argument is a regex (note that you need both \n and \r for portability):

(s-split "[\n\r]" "cat\ncow\ndog\n" t) ; => ("cat" "cow" "dog")

Yet there are 2 functions which can simplify our code. -map is similar to mapcar:

(-map
  (lambda (animal)
    (list (cl-sort (copy-sequence animal) '<) animal))
  '("cat" "cow" "dog"))
;; return
;; (("act" "cat")
;;  ("cow" "cow")
;;  ("dgo" "dog"))

However in dash there are anaphoric versions of functions that accept a function as an argument, such as -map. Anaphoric versions allow to use shorter syntax by exposing local variable as it and start with 2 dashes. E.g. the below are equivalent:

(-map (lambda (x) (+ x 1)) (1 2 3)) ; => (2 3 4)
(--map (+ it 1) (1 2 3)) ; => (2 3 4)

Another improvement is f-read-text from f, which simply returns contents of a file as a string:

(f-read-text "file.txt") ; => "cat\ncow\ndog\n"

Combine best of all worlds

(--map (list (cl-sort (copy-sequence it) '<) it)
       (split-string (f-read-text "file.txt")))

Upvotes: 8

coredump
coredump

Reputation: 38799

On my emacs, either C-j or C-x C-e evaluates the form as you said. When I try to do the same with (get-file "test") the debugger complains about with-open-file being undefined. I cannot find with-open-file in cl-lib (or cl) emacs packages. Did you require some other package? Also, I think the idiomatic way of opening file in Emacs is to temporary visit them in buffers. Anyway, if the code was Common Lisp it would be ok except for collect ((sort ...) word), where you are not building a list but using (sort ...) in a function position. I'd use (list (sort ...) word) instead.

Upvotes: 1

Related Questions