Minimal features required for an extensible language

Started by Aeneas Mackenzie

Aeneas Mackenzie

I am interested in languages which do not need very much support for a functional environment. The key examples are Forth and Lisp with a simple evaluator. See here for the scale of a barebones lisp (the original, in fact), although forth can get smaller in terms of actual implemented size. Are there other models of small interpreters? I'm especially interested in real departures from either of these models.

ddrone

I'm a bit confused about your question, what exactly you mean by "functional environment" in your question? The examples you give are wildly different and do not help to clarify it.

Kartik Agaram

Yes, I'd appreciate more elaboration as well. This question seems close to subjects I care about, so I'll speculatively talk about them in hopes that I'm helping somehow.

It's definitely cool how Lisp and Forth allow you to implement a basic language with very little code. But after years in the Lisp community and a few months in the Forth community I've started feeling that there's a bit of sleight of hand in each case:

  1. Lisp allows you to implement it quickly, but the original metacircular evaluator is itself in Lisp. So it's evidence that the programming model is powerful (important in 1956, but uncontested today), not that it's easy to implement. Functional environments in Lisp are not smaller than other languages.
     
    I spent a while trying to build the ideal interpreted language based on Lisp: extremely late bound, maximal expressivity for minimal implementation effort, etc. But I eventually ran up against limits of my understanding of the stack beneath my tiny implementation. (And also the dangers of putting off optimization for too long.)

  2. Forth allows you to implement it quickly and with minimum requirements of the stack beneath it. However it exacts a big cost in safety. You can usually interpret any bit of memory as any type. Trying to add safety back would probably again cause the implementation size to explode (though I don't have evidence for this opinion yet).
     
    I've been on a Forth mailing list in 2018, and played with several Forth implementations. I am particularly attracted to Reva Forth which bootstraps from sub-2k LoC of (32-bit) x86 Assembly without any dependence on the C stack. But it's been challenging to understand the implementation; just figuring out what form a function expects its input data to be in has been non-trivial. Perhaps I'm just unsufficiently immersed in the Forth eco-system. It would probably help to have someone next to me to ask questions of.

Stepping back from my experiences, the key attributes of a stack seem to be a) how comprehensible the implementation is, and b) how much it allows one to collaborate with others. Mainstream stacks enable collaboration in the short to medium term, but gradually grow incomprehensible to most people and thereby divide the community between implementors and users. Factor in the explosion in layers of abstraction, and over time each of us programmers understands a vanishingly tiny fraction of the stacks we rely on. I think this is the biggest problem in the field today.

In contrast, simple stacks like Forth are far more comprehensible. But the poor safety guarantees make it intractable in practice to share code. Forth doesn't seem to be a write-once language/stack. But it does feel like a solipsistic stack, with everyone working in their own little private universe.

Is there a way to attain both a) and b), to make a simple stack – with basic guardrails for newcomers reading strange code – that doesn't grow monotonically until it alienates most people? That's been the subject of much of my noodling lately. After a couple of years spent building a low-level VM language that focuses on maximal expressivity for tests rather than code, I've been thinking about ways to either get it off the C/libc dependency (I call this linear bootstrapping, to distinguish it from Lisp's classic metacircular bootstrapping. Building a compiler without access to a compiler.), or to retrofit these features to some simple conventional OS. No finished ideas yet. Most promising proto-idea: conventional compilers are complex largely because they introduce IRs that are first-class languages entirely unrelated to either source or target language. What if we designed a low-level language where raw machine code was valid? This would permit designing a compiler as a series of pass-through layers, each transforming some patterns but otherwise blindly transmitting what it doesn't recognize. You'd need less infrastructure to manage multiple languages in a single codebase. Newcomers would also have an easier time understanding the compiler because they'd only need to juggle a single language in their head.

Some prior work that seems loosely relevant:
a) Edmund Grimley Evans, "Bootstrapping a compiler from nothing"
b) https://github.com/kragen/stoneknifeforth
c) https://github.com/nineties/amber

Aeneas Mackenzie

ddrone: Sorry, I will explain better, although I think Kartik Agaram has understood what I meant.

I am interested in languages that achieve some level of extensibility and flexibility "from scratch". Often in small lisp implementations people use host-level lambdas to achieve functions, but this is not a small implementation, this is the entire host language plus the implementation. A lisp interpreter starting from only a few functions could be minimal, because (in theory) you only need those functions, an environment of definitions and a REPL. The GC is a complication here. Similarly forth has been implemented in extremely small packages, and a REPL can be made which only needs an initial small environment.

Kartik: thanks for those references, they are very interesting.

I may have answered most of my own question by phrasing it better: is there a way meaningfully different from "REPL + small environment" to create a fully featured language? What are some examples of ways to structure the REPL or environment to create different features? Forth and Lisp are very different languages although they can both be small.

Kartik Agaram

I see, so you're only interested in the UX of a small language? Yes, I have a hard time imagining a third alternative to a) REPL and b) batch run (after optional AOT or JIT compilation).

One slightly different spin on the REPL option is the teaching environment of my Mu project: http://akkartik.name/post/mu. It looks and feels like a REPL, but each command/response pair is constantly running, refreshing with the latest code. And you can mark a response as "expected" so that it turns red if it deviates. I don't know if you consider that sufficiently different. It's basically a way to get students gradually used to the idea of tests.