Data

We will define four kinds of object to begin with:

Integer: A number. For example: 3, -9, 0.
Symbol: A name consisting of a string of characters. For example: FOO, BAR, ADD-TWO. We will normalize characters to upper-case in this project, but this is not strictly necessary.
NIL: Represents "nothing". A bit like NULL in C and other languages.
Pair: A pair consists of two elements, which for historical reasons are called car and cdr. Both can hold either an integer, a symbol, NIL, or a reference to another pair. The types of each element may be different.

Integers, symbols and NIL are called simple data. The term atom can refer to either a simple datum or a pair (purists may disagree on this point).

Note that integers and symbols are immutable, so we can think of two integers with the same value as being the same object. This is particularly useful for symbols, because it allows us to test for equality by comparing pointers.

Implementation

Let's declare some C types to hold our data. There are many clever ways to store LISP objects efficiently, but for this implementation we will stick to a very simple scheme [please excuse the pun].

struct Atom {
	enum {
		AtomType_Nil,
		AtomType_Pair,
		AtomType_Symbol,
		AtomType_Integer
	} type;

	union {
		struct Pair *pair;
		const char *symbol;
		long integer;
	} value;
};

struct Pair {
	struct Atom atom[2];
};

typedef struct Atom Atom;

A few macros will be handy:

#define car(p) ((p).value.pair->atom[0])
#define cdr(p) ((p).value.pair->atom[1])
#define nilp(atom) ((atom).type == AtomType_Nil)

static const Atom nil = { AtomType_Nil };

The "p" in nilp stands for "predicate". Identifiers in C may not contain question marks. There is no need to restrict our LISP implementation in that way, of course.

Integers and (pointers to) strings can be copied around, but we need to allocate pairs on the heap.

Atom cons(Atom car_val, Atom cdr_val)
{
	Atom p;

	p.type = AtomType_Pair;
	p.value.pair = malloc(sizeof(struct Pair));

	car(p) = car_val;
	cdr(p) = cdr_val;

	return p;
}

cons is a function to allocate a pair on the heap and assign its two elements.

At this point you will have noticed that using cons will leak memory the moment its return value is discarded. We will deal with that later. Of course, if you are using a garbage-collected language then the problem is already taken care of.

Testing

Now we can start creating LISP objects. An integer:

Atom make_int(long x)
{
	Atom a;
	a.type = AtomType_Integer;
	a.value.integer = x;
	return a;
}

And a symbol:

Atom make_sym(const char *s)
{
	Atom a;
	a.type = AtomType_Symbol;
	a.value.symbol = strdup(s);
	return a;
}

Textual representation

We will write a pair like this:

(a . b)

where a is the car and b is the cdr.

By using the cdr of a pair to reference another pair, we can create a chain:

(a . (b . (c . (d . NIL))))

Notice that the cdr of the last pair is NIL. This signifies the end of the chain, and we call this structure a list. To avoid having to write a large number of brackets, we will write the previous list like this:

(a b c d)

Finally, if the cdr of the last pair in a list is not NIL, we will write this:

(p q . r)

which is equivalent to

(p . (q . r))

This is called an improper list.

Implementation

Printing an atom or list is simple.

void print_expr(Atom atom)
{
	switch (atom.type) {
	case AtomType_Nil:
		printf("NIL");
		break;
	case AtomType_Pair:
		putchar('(');
		print_expr(car(atom));
		atom = cdr(atom);
		while (!nilp(atom)) {
			if (atom.type == AtomType_Pair) {
				putchar(' ');
				print_expr(car(atom));
				atom = cdr(atom);
			} else {
				printf(" . ");
				print_expr(atom);
				break;
			}
		}
		putchar(')');
		break;
	case AtomType_Symbol:
		printf("%s", atom.value.symbol);
		break;
	case AtomType_Integer:
		printf("%ld", atom.value.integer);
		break;
	}
}

By using recursion we can print aribtrarily complex data structures. (Actually that's not true: for a very deeply nested structure we will run out of stack space, and a self-referencing tree will never finish printing).

Testing

See what print_expr does with various atoms:

Atom Output

make_int(42) 42
make_sym("FOO") FOO
cons(make_sym("X"), make_sym("Y")) (X . Y)
cons(make_int(1), cons(make_int(2), cons(make_int(3), nil))) (1 2 3)

Atom	Output
`make_int(42)`	`42`
`make_sym("FOO")`	`FOO`
`cons(make_sym("X"), make_sym("Y"))`	`(X . Y)`
`cons(make_int(1), cons(make_int(2), cons(make_int(3), nil)))`	`(1 2 3)`

All this is pretty trivial. We'll get on to some more interesting stuff in the next chapter.

One last thing

Remember we said that we would treat identical symbols as being the same object? We can enforce that by keeping track of all the symbols created, and returning the same atom if the same sequence of characters is requested subsequently.

Languages with a set or hashtable container make this easy, but we can use the LISP data structures already implemented to store the symbols in a list:

static Atom sym_table = { AtomType_Nil };

Atom make_sym(const char *s)
{
	Atom a, p;

	p = sym_table;
	while (!nilp(p)) {
		a = car(p);
		if (strcmp(a.value.symbol, s) == 0)
			return a;
		p = cdr(p);
	}

	a.type = AtomType_Symbol;
	a.value.symbol = strdup(s);
	sym_table = cons(a, sym_table);

	return a;
}

Neat, huh? It's not particularly efficient, but it will do fine for now.