LISP is all about expressions. An expression can be a literal, an identifier, or a list consisting of an operator and one or more arguments.
A literal is an object with an intrinsic value. In our system, that's
either an integer or NIL (if you consider "nothing" to be
a value).
An identifier is a name for an object. Symbols can be identifiers.
Everything else is a list of the form (operator argument...)
where argument... means zero or more arguments.
To associate identifiers with objects we need an environment. This is a collection of bindings, each of which consists of an identifier and its corresponding value. For example:
| Bindings | |
|---|---|
| Identifier | Value |
FOO | 42 |
BAR | NIL |
BAZ | (X Y Z) |
BAZ
is a list containing three symbols.
An environment can also have a parent environment. If there is no binding for a particular identifier in the environment, we can check the parent, the parent's parent and so on. In this way we can create a tree of environments which share bindings with their ancestors unless explicit replacements exist.
There is a convenient way of representing environments using our LISP data types:
(parent (identifier . value)...)So the environment above (assuming it has no parent) is:
(NIL (FOO . 42) (BAR . NIL) (BAZ . (X Y Z)))
Here is a function to create an empty environment with a specified
parent (which could be NIL):
Atom env_create(Atom parent)
{
return cons(parent, nil);
}
Next we have two functions to retrieve and create bindings in an environment.
int env_get(Atom env, Atom symbol, Atom *result)
{
Atom parent = car(env);
Atom bs = cdr(env);
while (!nilp(bs)) {
Atom b = car(bs);
if (car(b).value.symbol == symbol.value.symbol) {
*result = cdr(b);
return Error_OK;
}
bs = cdr(bs);
}
if (nilp(parent))
return Error_Unbound;
return env_get(parent, symbol, result);
}
Disallowing duplicate symbols means that we don't have to call
strcmp here, which should mean that this lookup function
is not too slow.
int env_set(Atom env, Atom symbol, Atom value)
{
Atom bs = cdr(env);
Atom b = nil;
while (!nilp(bs)) {
b = car(bs);
if (car(b).value.symbol == symbol.value.symbol) {
cdr(b) = value;
return Error_OK;
}
bs = cdr(bs);
}
b = cons(symbol, value);
cdr(env) = cons(b, cdr(env));
return Error_OK;
}
Only env_get recursively checks the parent environments.
We don't want to modify the bindings of parents.
Now that we have expressions, we can start to evaluate them. Evalution is a process which takes an expression and an environment, and produces a value (the result). Let's specify the rules.
QUOTE(QUOTE EXPR) is
EXPR, which is returned without evaluating.
DEFINE(DEFINE SYMBOL EXPR) creates a binding
for SYMBOL (or modifies an existing binding) in the
evaluation environment. SYMBOL is bound to the value
obtained by evaluating EXPR. The final result is
SYMBOL.
We will need to check whether an expression is a proper list.
int listp(Atom expr)
{
while (!nilp(expr)) {
if (expr.type != AtomType_Pair)
return 0;
expr = cdr(expr);
}
return 1;
}
The Error enumeration needs a few more entires:
Error_Unbound |
Attempted to evaluate a symbol for which no binding exists |
Error_Args |
A list expression was shorter or longer than expected |
Error_Type |
An object in an expression was of a different type than expected |
The function to perform evaluation is now a straightforward translation of the rules into C.
int eval_expr(Atom expr, Atom env, Atom *result)
{
Atom op, args;
Error err;
if (expr.type == AtomType_Symbol) {
return env_get(env, expr, result);
} else if (expr.type != AtomType_Pair) {
*result = expr;
return Error_OK;
}
if (!listp(expr))
return Error_Syntax;
op = car(expr);
args = cdr(expr);
if (op.type == AtomType_Symbol) {
if (strcmp(op.value.symbol, "QUOTE") == 0) {
if (nilp(args) || !nilp(cdr(args)))
return Error_Args;
*result = car(args);
return Error_OK;
} else if (strcmp(op.value.symbol, "DEFINE") == 0) {
Atom sym, val;
if (nilp(args) || nilp(cdr(args)) || !nilp(cdr(cdr(args))))
return Error_Args;
sym = car(args);
if (sym.type != AtomType_Symbol)
return Error_Type;
err = eval_expr(car(cdr(args)), env, &val);
if (err)
return err;
*result = sym;
return env_set(env, sym, val);
}
}
return Error_Syntax;
}
Extending the read-print loop from the previous chapter, we now have a read-eval-print loop (REPL). This is the core of our LISP interpreter.
int main(int argc, char **argv)
{
Atom env;
char *input;
env = env_create(nil);
while ((input = readline("> ")) != NULL) {
const char *p = input;
Error err;
Atom expr, result;
err = read_expr(p, &p, &expr);
if (!err)
err = eval_expr(expr, env, &result);
switch (err) {
case Error_OK:
print_expr(result);
putchar('\n');
break;
case Error_Syntax:
puts("Syntax error");
break;
case Error_Unbound:
puts("Symbol not bound");
break;
case Error_Args:
puts("Wrong number of arguments");
break;
case Error_Type:
puts("Wrong type");
break;
}
free(input);
}
return 0;
}
Let's see what it can do.
> foo Symbol not bound > (quote foo) FOO > (define foo 42) FOO > foo 42 > (define foo (quote bar)) FOO > foo BAR
We can now interactively assign names to objects.