Expressions, Environment and Evaluation

Expressions

LISP is all about expressions. An expression can be a literal, an identifier, or a list consisting of an operator and one or more arguments.

A literal is an object with an intrinsic value. In our system, that's either an integer or NIL (if you consider "nothing" to be a value).

An identifier is a name for an object. Symbols can be identifiers.

Everything else is a list of the form (operator argument...) where argument... means zero or more arguments.

Environment

To associate identifiers with objects we need an environment. This is a collection of bindings, each of which consists of an identifier and its corresponding value. For example:

Bindings

Identifier Value

FOO 42

BAR NIL

BAZ (X Y Z)

Note that the identifiers are all symbols, but the values can be any object within our system of data — the value for BAZ is a list containing three symbols.

Bindings
Identifier	Value
`FOO`	`42`
`BAR`	`NIL`
`BAZ`	`(X Y Z)`

An environment can also have a parent environment. If there is no binding for a particular identifier in the environment, we can check the parent, the parent's parent and so on. In this way we can create a tree of environments which share bindings with their ancestors unless explicit replacements exist.

Implementation

There is a convenient way of representing environments using our LISP data types:

(parent (identifier . value)...)

So the environment above (assuming it has no parent) is:

(NIL (FOO . 42) (BAR . NIL) (BAZ . (X Y Z)))

Here is a function to create an empty environment with a specified parent (which could be NIL):

Atom env_create(Atom parent)
{
	return cons(parent, nil);
}

Next we have two functions to retrieve and create bindings in an environment.

int env_get(Atom env, Atom symbol, Atom *result)
{
	Atom parent = car(env);
	Atom bs = cdr(env);

	while (!nilp(bs)) {
		Atom b = car(bs);
		if (car(b).value.symbol == symbol.value.symbol) {
			*result = cdr(b);
			return Error_OK;
		}
		bs = cdr(bs);
	}

	if (nilp(parent))
		return Error_Unbound;

	return env_get(parent, symbol, result);
}

Disallowing duplicate symbols means that we don't have to call strcmp here, which should mean that this lookup function is not too slow.

int env_set(Atom env, Atom symbol, Atom value)
{
	Atom bs = cdr(env);
	Atom b = nil;

	while (!nilp(bs)) {
		b = car(bs);
		if (car(b).value.symbol == symbol.value.symbol) {
			cdr(b) = value;
			return Error_OK;
		}
		bs = cdr(bs);
	}

	b = cons(symbol, value);
	cdr(env) = cons(b, cdr(env));

	return Error_OK;
}

Only env_get recursively checks the parent environments. We don't want to modify the bindings of parents.

Evaluation

Now that we have expressions, we can start to evaluate them. Evalution is a process which takes an expression and an environment, and produces a value (the result). Let's specify the rules.

A literal will evaluate to itself.
The environment allows us to determine a value for an identifier. Attempting to evaluate an identifier for which no binding exists is an error.
A list expression with one of the following operators is called a special form:

QUOTE

The result of evaluating (QUOTE EXPR) is EXPR, which is returned without evaluating.
DEFINE

Evaluating (DEFINE SYMBOL EXPR) creates a binding for SYMBOL (or modifies an existing binding) in the evaluation environment. SYMBOL is bound to the value obtained by evaluating EXPR. The final result is SYMBOL.
Anything else, including list expressions with any other operator, is invalid.

Implementation

We will need to check whether an expression is a proper list.

int listp(Atom expr)
{
	while (!nilp(expr)) {
		if (expr.type != AtomType_Pair)
			return 0;
		expr = cdr(expr);
	}
	return 1;
}

The Error enumeration needs a few more entires:

Error_Unbound Attempted to evaluate a symbol for which no binding exists

Error_Args A list expression was shorter or longer than expected

Error_Type An object in an expression was of a different type than expected

The function to perform evaluation is now a straightforward translation of the rules into C.

int eval_expr(Atom expr, Atom env, Atom *result)
{
	Atom op, args;
	Error err;

	if (expr.type == AtomType_Symbol) {
		return env_get(env, expr, result);
	} else if (expr.type != AtomType_Pair) {
		*result = expr;
		return Error_OK;
	}

	if (!listp(expr))
		return Error_Syntax;

	op = car(expr);
	args = cdr(expr);

	if (op.type == AtomType_Symbol) {
		if (strcmp(op.value.symbol, "QUOTE") == 0) {
			if (nilp(args) || !nilp(cdr(args)))
				return Error_Args;

			*result = car(args);
			return Error_OK;
		} else if (strcmp(op.value.symbol, "DEFINE") == 0) {
			Atom sym, val;

			if (nilp(args) || nilp(cdr(args)) || !nilp(cdr(cdr(args))))
				return Error_Args;

			sym = car(args);
			if (sym.type != AtomType_Symbol)
				return Error_Type;

			err = eval_expr(car(cdr(args)), env, &val);
			if (err)
				return err;

			*result = sym;
			return env_set(env, sym, val);
		}
	}

	return Error_Syntax;
}

Testing

Extending the read-print loop from the previous chapter, we now have a read-eval-print loop (REPL). This is the core of our LISP interpreter.

int main(int argc, char **argv)
{
	Atom env;
	char *input;

	env = env_create(nil);

	while ((input = readline("> ")) != NULL) {
		const char *p = input;
		Error err;
		Atom expr, result;

		err = read_expr(p, &p, &expr);		

		if (!err)
			err = eval_expr(expr, env, &result);

		switch (err) {
		case Error_OK:
			print_expr(result);
			putchar('\n');
			break;
		case Error_Syntax:
			puts("Syntax error");
			break;
		case Error_Unbound:
			puts("Symbol not bound");
			break;
		case Error_Args:
			puts("Wrong number of arguments");
			break;
		case Error_Type:
			puts("Wrong type");
			break;
		}

		free(input);
	}

	return 0;
}

Let's see what it can do.

> foo
Symbol not bound
> (quote foo)
FOO
> (define foo 42)
FOO
> foo
42
> (define foo (quote bar))
FOO
> foo
BAR

We can now interactively assign names to objects.

`Error_Unbound`	Attempted to evaluate a symbol for which no binding exists
`Error_Args`	A list expression was shorter or longer than expected
`Error_Type`	An object in an expression was of a different type than expected