Spreadsheets as programming languages

Started by Mike Hearn

Mike Hearn

OK - let's try to break the ice on this forum.

I was struck the other day by the realisation that the world's most popular programming language is probably Excel, which can be seen as a kind of pure functional language. What's interesting about Excel is the extent to which it is usable and has seen mass adoption by people who aren't developers and don't think of themselves as programmers.

Modern trends like the one towards FP and especially ReactiveX are in turn taking conventional programming closer to the spreadsheet model.

Thus it seems a shame that spreadsheet theory has seen so little progress. There is Chorus which is a followon from Subtext. There are some interesting papers on converting SQL queries into spreadsheets, there is a lot of data about the error rate inside spreadsheets (don't look). There is a tutorial on how to build a simple 3D graphics engine in Excel (I am not kidding). And Excel can be used to build full blown GUIs on top of an underlying "backend" made of spreadsheets.

There's also Unreal Blueprints, which are an interesting form of visual programming, which combines a notion of temporal control flow with FP. My experience with that has been that it is still perceived as "programming" by users though and that tends to scare some people away.

Does anyone know of interesting research in this space? Are there any ideas that could produce some useful leap forward over Excel?

ilyasergey

Mike, thanks for the start of an interesting discussion.

As a matter of fact, there's been plenty of PL-oriented research in this area. On the top of my head, I can recall advanced, motivated by research in (A) program synthesis and (B) probabilistic programming. There might be more, but let me start by elaborating on these two.

(A) is represented by research conducted by Sumit Gulwani at MSR Redmond and his collaborators. To begin an Excel feature known as "flash fill" is a byproduct of his work, targeting a specific case of program synthesis: "programming by example". Here's a good survey paper from CACM 2012. In this vein, I should also mention the following papers:

(B) For the probabilistic programming take, I'd recommend to take a look at the following work by Andy Gordon (also from MSR) and others:

This work helps to turn spreadsheets into probabilistic programs, whose "result" is a distribution of a certain random variable, which can be used to model all kinds of things. This topic is quite large and, perhaps, we should start another thread for it.

Duncan Cragg

I was struck the other day by the realisation that the world's most popular programming language is probably Excel, which can be seen as a kind of pure functional language. What's interesting about Excel is the extent to which it is usable and has seen mass adoption by people who aren't developers and don't think of themselves as programmers.

I agree that there's a great future for a PL that embraces what makes spreadsheet formulae so accessible to non-technical people.

My own PL has some commonalities to spreadsheets: change one object's state and another can change state according to a pure functional transformation.

Duncan

mathiasx

I've always heard spreadsheets related to dataflow programming. I've worked on Hoplon which is a dataflow library for ClojureScript (compiles-to-javascript for browser apps.)

gasche

There are interesting projects that take inspiration from spreadsheets and some other models (Eve also explored this space, although it is now rather different). I think there would already be a lot of mileage to be had from a spreadsheet program taking programming practices seriously, by being equipped with a type system (see for example Static Analysis of Spreadsheet Applications
for Type-Unsafe Operations Detection
, Tie Cheng and Xavier Rival, 2015) or exploring the addition of programming paradigms that work well with the traditional spreadsheet interface.

For example, I think that it would be very nice to build a constraint solver on top of a spreadsheet interface. Each cell could be populated with a domain constraint (so it would play the role of a solver variable), or a cost formula to minimize, and "running" the spreadsheet would search for a solution. I think this could replace domain-specific end-user applications using solvers. One example of application would be schedule planning, where a number people have to be assigned to number of tasks under the form of time slots spread over several days, with availability constraint for the people (some may be absent on certain days), trying to give roughly the same amount of total task time to everyone.