Donnerstag, 20. Oktober 2011

Regular Expressions Should Be Handled At Compile Time!

So, I talked with a friend about what our ideal programming language might look like.

One of the points that he raised was that regular expressions, and some similar constructs, should be known and compiled/handled by the compiler itself, so an error in a regex can be a compile-time error.

That is a GREAT idea!

Our first thought was that regular expressions, and a few other such constructs, should be a language feature. However, language features are really the last resort - they're limited to what the language designer thought of, they cannot easily be extended, and they're a "special case" compared to normal functions. While in the background, they really do nothing else than normal functions do (in many cases at least).

We came up with an idea for an additional compiler phase that is accessible to a library via some sort of hook. What's needed is something like the following:

compileTimeImport com.superduper.regex(@NewIsImplied);
...
List list = regex{"^\w+$"}.match("thisshouldmatch");

The curly braces would tell the compiler that we are trying to use a compiler-phase feature, and that it needs to check all parameters for being constant.

That way, if the compiler encounters something like

List list = regex{"^[a-+$"}.match("thisshouldmatch");

it could complain, much like perl does:


$ perl -e '$x =~ /^[a-+$/;'
Invalid [] range "a-+" in regex; marked by <-- HERE in m/^[a-+ <-- HERE $/ at -e line 1.


Of course, you couldn't use that for non-constant expressions, which is a sad and severe limitation. For that case, we might use square brackets instead, like this:

String matchString = ...whatever expression...;

List list = regex["^\w+"+matchString+"$"].match("thisshouldmatch");

So, why the curly or square brackets?

  • First, they imply a "new" operator, and thereby help keep the code short.
  • Second, I want to make sure that the compiler understands me - if I use curly brackets, but then use a non-const expression, then I want it to throw an error. Of course, the compiler might just be able to figure it out, but I don't want to inadvertently use the wrong form.
Should we decide that we really want to go for the automatic decision here, we could just allow for the use of normal brackets:

String matchString = ...whatever expression...;

List list = regex("^\w+"+matchString+"$").match("thisshouldmatch");

I think those "little things" might help make programming a much more enjoyable waste of time.

Keine Kommentare:

Kommentar veröffentlichen