SourceForge.net Logo

One syntax to rule them all

Last modified on 2008/03/08



General design philosophy

  • No need for visual environment, only a text editor

  • A tool, not a solution

  • No magic - flexible rather than extensible

  • By default allow

  • By default as in C/C++/C#

Why lists, records and tuples?

The three data constructs are most often found in existing languages. For example in C we have arrays (lists), structs (records) and function arguments (tuples). In relational data bases we have tables, which are simply lists of records.

One might also consider sets, maps, matrices, isomorphisms and many other, but I chosen only the smallest possible set, which makes it possible to describe anything. Thus, there is only one construct for collections (list) and one for single entities (record). Tuples were added later mainly as a convenient abbreviations of records.

Using these three constructs you can express all other adding tags to describe the intended meaning. For example:

// The tag describles that elements cannot repeat

set { 1, 2, 3 }

// The arrow is a syntactic sugar operator

map { "Bob" -> 23, "Alice" -> 21 }

// Matrix as a tuple of dimensions and values

matrix( 2, 2 ):


2 1

3 0


Why curly braces for lists, round for records and tuples?

  • Because that way it is in the C-family syntaxes (for lists and tuples)

  • Same bracing for records and tuples because they are closely related

Why the C-family syntax?

There are generally 3 syntax families:

  1. C - braces for grouping

  2. Pascal - begin-end for grouping

  3. Python - indentations for grouping

I have chosen the C-like approach, because the other options are good only for grouping of statements, but not for grouping of numbers and other general data. Braces can be used for both multi-line and single-line collections leaving more options to the users.

Yaml data language has a mixed solution - it basically uses indentations, but also offers braces for single-line grouping. I think it is an unneccessary complication of the language, although I admit that it allows creating data documents that look nearly as text documents written by a non-programmer.

Why operators and their priorities cannot be user-defined?

Because expressions couldn't be parsed (understood) without knowledge about some non-local context. User-defined operators would create language extensions, which I wanted to avoid (no magic!).

Without that feature, Harpoon remains simple, yet very flexible. It is neccessary to add some seemingly superfluous parentheses and spaces, but on the other hand user can use unlimited set of operators and always knows which one applies first. Note also that many programming style guides actually recommend using more paretheses that it is required (e.g. this one).

Why tags and identifiers are not distinguished somehow?

It is true that in some situations the same token can be interpreted a tag or as an identifier depending on the context. I was considering adding a special character to tags (e.g. prefix '$'), but then I have suddenly realized that this was already done - whenever the token is followed be a semicolon (one character) it is interpreted as an identifier! I decided no to solve problems which were already solved.