Reputation: 6014

Are Lisp source code files themselves lists?

No matter the Lisp dialect, it looks like every source code file containing Lisp functions isn't itself a list (the first time I was "surprised" by this was when working on my Emacs .el files).

I've got a few questions but they're all related to the same "issue" and it's probably just me misunderstanding a few things.

Is there a reason why source code files for the various Lisp dialects seems to be a bunch of "disorganized" functions like this:

(function1 ...)
(function2 ...)
(function3 ...)

Instead of a "Lisp list" of functions, maybe like this:

(
  '(function1 ...)
  '(function2 ...)
  '(function3 ...)
)

I'm a bit surprised in this whole "code is data, data is code" thing to see that source code file themselves apparently aren't neat lists... Or are they!?

Are the source code files something you're supposed to "manipulate" or not?

What if I wanted to, say, convert one of my .clj (Clojure) source file to some CSS+HTML webpage, isn't it a "problem" that the source code file apparently isn't itself a list?

I'm beginning with Lisp so I don't know if my question makes sense or not and any explanation would be welcome.

Upvotes: 10

Answers (7)

bmillare

Reputation: 4233

To be thorough, all the source files are text, not lisp data structures. To evaluate or compile the code, the lisp must first READ the file, which means to transform the text to lisp data structures. Recall the acronym REPL, for which the first two letters stand for READ, and EVAL. READ takes a string representation of the code, and returns a data structure representing the code. EVAL takes the returned data structure, and interprets (or compiles and runs) the data structure as code. Thus, its important to remember that there are intermediate steps involved.

A good question is, what happens when multiple s-expressions are passed to READ, and they are not in a list, as you mentioned?

If you look at the code, you'll usually find multiple versions of READ, clojure's read-string only reads and returns the first s-expression, ignoring the rest. But, the reader used in clojure's load-file, will take the whole string, and "effectively" (implementations may differ) wrap an implicit do (or progn in common lisp) around all of the forms, and then pass that to eval. This behavior contrasts to what happens in the REPL, forms are read, evaluated, and printed sequentially.

In both cases though, this "behind the scene" behavior is a trade-off made for concision. We can assume when we load a file of text of s-expressions, we want them all to be evaluated, and at most return the value of the last s-expression.

Upvotes: 6

Rainer Joswig

Reputation: 139261

In Common Lisp a source file contains lisp forms and comments. Lisp forms are either data or Lisp code. Typical operations on a source file are done by the functions LOAD and COMPILE-FILE.

LOAD would read forms from a file and execute them one by one.

COMPILE-FILE is much more complex. It typically reads forms and compiles them to some other representation (machine code, byte code, C code, ...). It does not execute the code.

What would it help you if the file contain one list of forms instead of just multiple forms below each other?

you would have one level of added parentheses
you would have to read the whole list before you can do anything with it (or alternatively you need a different reader mechanism)
adding forms to the end of a file by a program would be a pain
you can't add something into the file which changes the reader interpretation of the rest of the file
files can't be infinitively long for LOAD

Now for a example a compiler would read lisp forms from a file stream and compile them piece by piece.

If you want all forms you can do

CL-USER 170 > (defun read-forms (file)
               (with-open-file (stream file)
                 (loop for form = (read stream nil nil)
                       while form
                       collect form)))
READ-FORMS

CL-USER 171 > (read-forms (capi:prompt-for-file "source file"))
((DEFPARAMETER *UNITS-TO-SHOW* 4.1)
 (DEFPARAMETER *TEXT-WIDTH-IN-PICAS* 28.0)
 (DEFPARAMETER *DEVICE-PIXELS-PER-INCH* 300)
 (DEFPARAMETER *PIXELS-PER-UNIT* (* (/ (/ *TEXT-WIDTH-IN-PICAS* 6)
                                       (* *UNITS-TO-SHOW* 2))
                                    *DEVICE-PIXELS-PER-INCH*))
...

If you want to put parentheses around everything use PROGN:

 (progn
   'form-1
   (defun function-defintion-form () )
   42)

PROGN preserves also the 'top-level-ness' of its sub forms.

Side note: alternatives to this have been explored in Lisp for decades. The most prominent example is the now defunct Interlisp-D from Xerox. Interlisp-D was developed in parallel to Smalltalk by Xerox PARC. Interlisp-D originally used an structure editor to edit Lisp data and the source code was edited as such Lisp data. The development environment was based on this idea. But in the long run the 'source as text' won. Still you can emulate some of that in many current Lisp environments. For example many Lisp systems allow to write an 'image' of the current execution memory - this image includes all data and all code (also the compiled code). So you can work on this data/code and save an image from time to time.

Upvotes: 14

Will Ness

Reputation: 71065

In the beginning (of Lisp) there was an interactive REPL: read, then evaluate, then print results and ask again, loop. You'd type some text at the prompt. The run-time system would "read" it, converting the text into its internal representation of "code", and then evaluate ("execute" or whatever) it:

> (setq s "(setq a 2)")
"(setq a 2)"
> (type-of s)          ; s is just a bunch of text characters
(SIMPLE-BASE-STRING 10)
> (setq r (read (make-string-input-stream s)))
(SETQ A 2)
> (type-of r)          ; the result of reading is Lisp data - a CONS cell
CONS                   ;     - - - - - - - - -    ~~~~~~~~~
> (type-of 'a)         ; A is just a symol
SYMBOL
> (type-of a)          ; ERROR: A has no value    
*** - EVAL: variable A has no value

> (eval r)             ; now what? The data got treated as code.
2                      ;               ~~~~                ~~~~
> a                    ; 'A' has got its value
2
> (setf (caddr r) 4)   ; alter the Lisp data object! that is 
4                      ;  the value of a symbol 'r'
> (eval r)             ; execute the altered data, as 
4                      ;  new version of code
> a
4

So you see, "s-expressions", AST and the like are abstractions, which are represented by concrete, simple, basic Lisp data objects, in Lisp.

Now source files are nothing mysterious, they just come to relieve us from having to type our definitions at the REPL over and over again. How the contents of source files are read, is entirely arbitrary, up to the concrete implementation. You could easily have implementations that would read Python, Haskell, or C-like syntax files too.

Of course the Common Lisp standard defines how its compliant implementation should read its Common Lisp source files. But your system could define some additional formats as valid to be read, as well. Least of all it is constrained by the need to have them all represented as Lisp list-like syntax, and even less so as one giant list. It is free to treat the source text however it wishes.

Upvotes: 3

6502

Reputation: 114511

In Lisp there are two levels of source code, or there is no source code at all depending on how you define source code.

The two levels are present because two separate conceptual steps are performed (normally) by a Lisp interpreter/compiler.

First step: "reading"

In this step the source code is a sequence of characters, for example coming from a file. Here the parenthesis, quoted strings, numbers, symbols, quote signs and even part of quasiquoting syntax is processed and transformed into Lisp data structures. At this level the syntax rules are about parenthesis, digits, pipes, quotes, semicolons, sharp signs, commas, at-signs and so on.

Second step: "compiling"/"interpreting"

In this step the input are Lisp data structures and the output is either machine code, byte code or possibly the source is directly executed by an interpreter. At this level the syntax is about the meaning of special forms... e.g. (if ...), (labels ...), (symbol-macrolet ...) and so on. The structure is uniform in Lisp code (just lists and atoms) but the semantic isn't (if forms look like function calls, but they are not).

So in this view the question to your answer is yes and no. No for step 1, yes for step 2. If you consider only files then the answer is no... files contain characters, not lists. Those characters can be transformed by a reader into lists.

Lisp has no syntax

Why then someone says that Lisp has no syntax when in fact has two different syntax levels? The reason is that both of these levels are under the control of the programmer.

You can customize level 1 by defining reader macros, and you can customize level 2 by defining macros. So Lisp has no fixed syntax, and therefore a source file can begin with a "lispy" look and can end looking exactly like Python code.

A source file can contain anything (from a certain point on) because the initial forms could define some new reading rules that will change the meaning of following characters.

Normally Lisp programmers don't do crazy things with the reading level so most Lisp source code files look just like sequences of Lisp forms and they remain "lispy".

But this is not an hard constraint... for example I was not joking about Lisp syntax morphing into Python: someone did exactly that.

Upvotes: 6

Jeremy

Reputation: 22415

Source code files are just a convenient place to store your lists. Lisp code (in general) is intended to be executed in a read-eval-print-loop (REPL) where each input is itself a list. So when you execute a source code file, you can think of it as each list in it is being read into a REPL one by one. The idea is that you have a fully-interactive environment which compliments the "code is data" paradigm.

Surely you can treat a file as one mega-list, but then you are implying that the file has a well-defined structure, which isn't always the case. If you really want to create a file that contains one huge list, then there is nothing stopping you from doing that. You can use the Lisp reader to read it in as one large list (of data?) and processes it (maybe using some sort of eval?) as you require. Take, for example, Leiningen's project.clj files. They generally are just one big defproject list.

Upvotes: 10

BRPocock

Reputation: 13914

The variation you suggest — having a list of quoted lists — is probably reflecting what (IMHO) is the single most confusing thing about Lisp ☺ — quoting!

The essential idea goes something like this:

The compiler (or interpreter) passes through your input (REPL or source file). Each list is then evaluated as a “form.” Most forms (lists) are going to be of a type like defun. Evaluating a defun form causes a change in the symbol table (which is a topic for another discussion) — it defines a function based upon the symbolic name that is in the form. ((defun foo (bar) (print bar)) defines that the symbol table should have an entry for foo that evaluates to (lamba (bar) (print bar)), effectively.)

These lists are not quoted, because we want them to be immediately evaluated. Quoting with '(…) or (quote …) is meant to prevent the compiler/REPL from evaluating something immediately.

The output of your compiler (depending upon which one it is) is generally going to be some kind of binary or bytecode that contains all of those functions that you've defined; or, perhaps, just the ones that are eventually referenced by a "main function" of some kind.

If you provided something like:

 (
         '(defun foo (bar) (print bar))
 )

Your compiler would try to evaluate the first element of the outer list, which is a quoted defun special form (or macro), and not have anything to do.

Nonetheless, you can do things like read in a Lisp source file using read and not eval it, to do exactly as you say: generate an HTML "copy" or similar.

Once you delve into funcall and defmacro, understanding where all those quotes belong (and, even better, the backquote-comma quote-unquote paradigm) will probably take a while to get used-to…

Upvotes: 2

Julien Chastang

Reputation: 17774

In Lisp, you program directly to the abstract syntax tree which is expressed as a nested list. Since Lisp is expressed in terms of its own list data structures, macros fall out as a consequence because those lists can be programatically modified. I suppose the very top level list is implied which is why, at least in Clojure, you do not see programs starting and ending with ( ... ).

Upvotes: 0

Are Lisp source code files themselves lists?

Answers (7)

First step: "reading"

Second step: "compiling"/"interpreting"

Lisp has no syntax

Related Questions