Typesystem

Defining A Typesystem For Your Language

What is a typesystem?

A typesystem is a part of a language definition used to assign types to several nodes in a model written in your language. It is also used to check certain constraints on nodes' types. Information about types of nodes is useful for:

  • finding type errors
  • checking conditions on nodes' types during generation to apply only appropriate generator rules
  • providing information required for certain refactorings (e.g. for the "extract variable" refactoring)
  • and more

Types

Any MPS node may serve as a type. To enable MPS to assign types to nodes of your language, you should create a language aspect for typesystem. The typesystem model for your language will be written in a typesystem language.

Inference Rules

The main concept of typesystem language is an inference rule. An inference rule for a certain concept is mainly responsible for computing a type for instances of that concept.

An inference rule consists of a condition and a body. A condition is used to determine whether a rule is applicable to a certain node. A condition may be of two kinds: a concept reference or a pattern. A rule with a condition in the form of concept reference is applicable to every instance of that concept and its subconcepts. A rule with a pattern is applicable to nodes that match the pattern. A node matches a pattern if it has the same properties and references as the pattern, and if its children match the pattern's children. A pattern also may contain several variables which match everything.

The body of an inference rule is a list of statements which are executed when a rule is applied to a node. The main kind of statements of typesystem language are statements used for creating equations and inequations between types.

Inference Methods

To avoid duplications, one may want to extract identical parts of code of several inference rules to a method. An inference method is just a simple Base Language method marked with an annotation "@InferenceMethod". There are several language constructions you may use only inside inference rules and replacement rules and inference methods, they are: "typeof" expressions, equations and inequations, "when concrete" statements, type variable declarations and type variable references, and invocations of inference methods. That is made for not to use such constructions in arbitrary methods which may be called in arbitrary context, maybe not during type checking.

Equations And Inequations

The main process which is performed by typesystem engine is the process of solving equations and inequations between types. A language designer tells the engine which equations it should solve by writing them in inference rules. To add an equation into an engine, the following statement is used: expr1 :==: expr2, where expr1 an expr2 are expressions which evaluate to a node.

Consider the following use case. You want to say that the type of a local variable reference is equal to a type of variable declaration. So, you write typeof (varRef) :==: typeof (varRef.localVariableDeclaration), and that's all. The typesystem engine will solve such equations automatically.

The above-mentioned expression typeof(expr) (where expr must evaluate to an MPS node) is a language construction which returns a so-called type variable which serves as a type of that node. Type variables become concrete types during the process of equation solving.

In certain situations you want to say that a certain type doesn't have to exactly equal another type, but also may be a subtype or a supertype of that type. For instance, the type of the actual parameter of a method call does not necessarily have to be the same type as that of the method's formal parameter - it can be its subtype. That is, a method which requires Object as a parameter may be applied to a String.

To express such a constraint, you may use an inequation instead of an equation. An inequation expresses the fact that a certain type should be a subtype of another type. It is written as follows: expr1 :<=: expr2.

Subtyping Rules

When the typesystem engine solves inequations, it requires information about whether a type is a subtype of another type. But how does the typesystem engine know about that? It uses subtyping rules. Subtyping rules are used to express subtyping relationship between types. In fact, a subtyping rule is a function which, given a type, returns its immediate supertypes.

A subtyping rule consists of a condition (which can be either a concept reference or a pattern) and a body, which is a list of statements that compute and return a node or a list of nodes that are immediate supertypes of the given node. When checking whether some type A is a supertype of another type B, the typesystem engine applies subtyping rules to B and computes its immediate supertypes, then applies subtyping rules to those supertypes and so on. If type A is among the computed supertypes of type B, the answer is "yes".

Quotations

A quotation is a language construction that lets you easily create a node with a required structure. Of course, you can create a node using smodelLanguage and then populate it with appropriate children, properties and references, using the same smodelLanguage. But there's a simpler - and more visual - way to accomplish that.

A quotation is an expression whose value is the MPS node written inside the quotation. For instance, a value of an expression < 2 + 3 > is a node (of concept PlusExpression), which has two children (of concept IntegerConstant), which both have a property "value" with values 2 and 3, respectively. (See the difference: the value of expression 2 + 3 is 5; the value of expression < 2 + 3 > is the MPS node of the appropriate structure).

Antiquotations

You may consider a quotation to be a "node literal," like a string literal or integer literal. That means its value is known statically, i.e. not only at runtime, but also at design time. However, it's useful to have some dynamic parts in a literal whose value is computed only at runtime. For such a purpose, the quotations language has another construction: antiquotation. An antiquotation can be placed instead of a quoted node's child, reference or property.

For instance, you want to create a ClassifierType with the class ArrayList, but its type parameter is known only dynamically, so you call a method, say, "computeMyTypeParameter()".

Thus, you write the following expression: < ArrayList < %( computeMyTypeParameter() )% > >. The construction %(...)% here is that very antiquotation.

One also may antiquotate reference targets and property values, with ^(...)^ and $(...)$, respectively; or a list of children of one role, using *(...)* .

Examples Of Inference Rules

Here are the simplest basic use cases of an inference rule:

  • to assign the same type to all instances of a concept (useful mainly for literals):
    applicable to concept = StringLiteral as nodeToCheck
    {
      typeof (nodeToCheck) :==: < String >
    }
    
  • to equate a type of a declaration and the references to it (for example, for variables and their usages):
    applicable to concept = VariableReference as nodeToCheck
    {
      typeof (nodeToCheck) :==: typeof (nodeToCheck.variableDeclaration)
    }
    
  • to give a type to a node with a type annotation (for example, type of a variable declaration):
    applicable to concept = VariableDeclaration as nodeToCheck
    {
      typeof (nodeToCheck) :==: nodeToCheck.type
    }
    
  • to establish a restriction for a type of a certain node: useful for actual parameters of a method, an initializer of a type variable, the right-hand part of an assignment, etc.
    applicable to concept = AssignmentExpression as nodeToCheck
    {
      typeof (nodeToCheck.rValue) :<=: typeof (nodeToCheck.lValue)
    }
    

Type Variables

Inside the typesystem engine, a type may be either a concrete type (a node) or a so-called type variable. Also, it may be a node which contains some type variables as its children or further descendants. A type variable is an undefined type which may then become a concrete type, as a result of solving equations that contain this type variable.

Type variables appear at runtime mainly as a result of a "typeof" operation, but you can create them manually if you want to. There's a statement TypeVarDeclaration in typesystem lanuage to do so. You write it like "var T" or "var X" or "var V", i.e. "var" followed by the name of a type variable. Then you may use your variable, for example, in antiquotations to create a node with type variables inside.

Example: an inference rule for "for each" loop. A "for each" loop in Java consists of a loop body, an iterable to iterate over, and a variable into which the next member of an iterable is assigned before the next iteration. An iterable should be either an instance of a subclass of the Iterable interface, or an array. To simplify the example, we don't consider the case where the iterable is an array. Therefore, we need to express the following: an iterable's type should be a subtype of an Iterable of something, and the variable's type should be a supertype of that very something. For instance, you can  write the following:

for (String s : new ArrayList<String>(...)) {
  ...
}

or the following:

for (Object o : new ArrayList<String>(...)) {
  ...
}

Iterables in both examples above have the type ArrayList<String> which is a subtype of Iterable<String>. Variables have types String and Object, respectively, both of which are subtypes of String.

As we see, an iterable's type should be a subtype of an Iterable of something, and the variable's type should be a supertype of that very something. But how to say "that very something" in typesystem language? The answer is, it's a type variable that we use to express the link between the type of an iterable and the type of a variable. So we write the following inference rule:

applicable for concept = ForeachStatement as nodeToCheck
{
  var T ;
  typeof ( nodeToCheck . iterable ) :<=:  Iterable < %( T )% >;
  typeof ( nodeToCheck . variable ) :>=:  T ;
}

Meet and Join types

Meet and Join types are special types which are treated differently by the typesystem engine. Technically Meet and Join types are instances of MeetType and JoinType concepts, respectively. They may have an arbitrary number of argument types which could be any nodes. Semantically, a Join type is a type which is a supertype of all its arguments, and a node which has a type Join(T1|T2|..Tn) can be regarded as if it has type T1 or type T2 or... or type Tn. a Meet type is a type which is a subtype of its every argument, so one can say that a node which has a type Meet(T1&T2&..&Tn) inhabits type T1 and type T2 and.. and type Tn. The separators of arguments of Join and Meet types (i.e. "|" and "&") are chosen respectively to serve as a mnemonics.

Meet and Join types are not unuseful. Meet types appear even in MPS Base Language (which is very close to Java), for instance the type of such an expression:

true ? new Integer(1) : "hello"

is Meet(Serializable & Comparable), because both Integer (the type of new Integer(1)) and String (the type of "hello") implement both Serializable and Comparable.

Join type is useful when, say, you want some function-like concept return values of two different types (node or list of nodes, for instance). Then you should make type of its invocation be Join(node<> | list<node<>>).

Overloaded Operators

Sometimes an operator (like +, -, etc.) has different semantics when applied to different values. For example, + in Java means addition when applied to numbers, and it means string concatenation if one of its operands is of type String. When the semantics of an operator depends on the types of its operands, it's called operator overloading. In fact, we have many different operators denoted by the same syntactic construction.

Let's try to write an inference rule for a plus expression. First, we should inspect the types of operands, because if we don't know the types of operands (whether they are numbers or Strings), we cannot choose the type for an operation (it will be either a number or a String). But we cannot write something like this:

if (typeof( nodeToCheck.leftOperand ).isInstanceOf IntegerType ) {
 ...
}

because when such a rule is applied, the type of the left operand can still be undefined (it may be a type variable) and will be computed later when there are additional equations in the typechecker engine. So, even if the left operand in the example above finally has a definite concrete type, which is IntegerType, such an inspection will say no. Then how do we solve this problem?

To solve this, in typesystem language every rule for a certain concept may contain so-called "child type restrictions". Child type restrictions say than a certain inference rule is applicable only if children's types satisfy a certain condition: that is, only if these types are subtypes or are equal to a specified type. A child type restriction has a form "typeof (this.role) :<=: expr", where "role" means a role of node's child, and "expr" is an expression which is evaluated to the type whose subtype the child's type should be. If a rule contains more than zero child type restrictions, it will be applied to a node only when the type of every node's child mentioned in restrictions becomes concrete, and only those concrete types satisfy the restrictions.

"When Concrete" Blocks

Sometimes one may want to perform more generic and complex analysis with type structure than is allowed by child type restrictions. That is, inspect inner structure of a concrete type. But as we have seen, you can't just inspect a result of "typeof" expression because it may be a type variable at that moment. To solve such a problem you can use a "when concrete" block.

when concrete ( expr as var ) {
  body
}

Here, "expr" is an expression which will evaluate to a mere type you want to inspect (not to a node type of which you want to inspect), and "var" is a variable to which an expression will be assigned. Then this variable may be used inside a body of "when concrete" block. A body is a list of statements which will be executed only when a type denoted by "expr" becomes concrete, thus inside the body of a when concrete block you may safely inspect its children, properties, etc. if you need to.

Replacement Rules

Motivation

Consider the following use case: you have types for functions in your language, e.g. (a 1, a 2,...a N ) -> r, where a 1, a 2, .., a N, and r are types: a K is a type of K-th function argument and r is a type of a result of a function. Then you want to say that your function types are covariant by their return types and countervariant by their argument types. That is, a function type F = (T 1, .., T N) -> R is a subtype of a function type G = (S 1, .., S N) -> Q (written as F <: G) if and only if R <: Q (covariant by return type) and for any K from 1 to N, T K :> S K (that is, countervariant by arguments' types).

The problem is, how to express covariance and countervariance in typesystem language? Using subtyping rules you may express covariance by writing something like this:

nlist <  > result = new nlist <  > ;
for ( node <  > returnTypeSupertype : immediateSupertypes ( functionType . returnType ) ) {
  node <FunctionType> ft = functionType . copy;
  ft . returnType = returnTypeSupertype;
  result . add ( ft ) ;
}
return  result ;

Okay, we have collected all immediate supertypes for a function's return type and have created a list of function types with those collected types as return types and with original argument types. But, first, if we have many supertypes of return type, it's not very efficient to perform such an action each time we need to solve an inequation and second, although now we have covariance by function's return type, we still don't have countervariance by function's arguments' types. We can't collect immediate subtypes of a certain type because subtyping rules give us supertypes, not subtypes.

In fact, we just want to express the abovementioned property: F = (T 1, .., T N) -> R <: G = (S 1, .., S N) -> Q (written as F <: G) if and only if R <: Q and for any K from 1 to N, T K :> S K . For this and similar purposes typesystem language has a notion called "replacement rule."

What's a replacement rule?

A replacement rule is another way to solve inequations. While the standard way is to transitively apply subtyping rules to a should-be-subtype until a should-be-supertype is found among the results (or is never found among the results), a replacement rule, if applicable to a inequation, removes the inequation and then executes its body (which usually contains "create equation" and "create inequation" statements).

Examples

A replacement rule for above-mentioned example is written as follows:

replacement rule  FunctionType_subtypeOf_FunctionType

applicable for  concept = FunctionType as functionSubType <: concept = FunctionType as functionSuperType

rule {
   if ( functionSubType . parameterType . count != functionSuperType . parameterType . count ) {
      error " different parameter numbers " -> equationInfo . getNodeWithError (  ) ;
      return ;
   }
   functionSubType . returnType :<=:  functionSuperType . returnType ;
   foreach ( node <  > paramType1 : functionSubType . parameterType ; node <  > paramType2 : functionSuperType . parameterType ) {
       paramType2 :<=:  paramType1 ;
   }
}

Here we say that a rule is applicable to a should-be-subtype of concept FunctionType and a should-be-supertype of concept FunctionType. The body of a rule ensures that the number of parameter types of function types are equal, otherwise it reports an error and returns. If the numbers of parameter types of both function types are equal, a rule creates an inequation with return types and appropriate inequation for appropriate parameter types.

Another simple example of replacement rules usage is a rule which states that a Null type (a type of null literal) is a subtype of every type except primitive ones. Of course, we can't write a subtyping rule for Null type which returns a list of all types. Instead, we write the following replacement rule:

replacement rule  any_type_supertypeof_nulltype

applicable for  concept = NullType as nullType <: concept = BaseConcept as baseConcept

rule {
   if ( baseConcept . isInstanceOf ( PrimitiveType ) ) {
      error " null type is not a subtype of primitive type " -> equationInfo . getNodeWithError (  ) ;
   }
}

This rule is applicable to any should-be-supertype and to those should-be-subtypes which are Null types. The only thing this rule does is checking whether should-be-supertype is an instance of PrimitiveType concept. If it is, the rule reports an error. If is not, the rule does nothing, therefore the inequation to solve is simply removed from the typesystem engine with no further effects.

Different semantics

A semantic of a replacement rule, as explained above, is to replace an inequation with some other equations and inequations or to perform some other actions when applied. This semantic really doesn't state that a certain type is a subtype of another type under some conditions. It just defines how to solve an inequation with those two types.

For example, suppose that during generation you need to inspect whether some statically unknown type is a subtype of String. What will an engine answer when a type to inspect is Null type? When we have an inequation, a replacement rule can say that it is true, but for this case its abovementioned semantics is unuseful: we have no inequations, we have a question to answer yes or no. With function types, it is worse because a rule says that we should create some inequations. So, what do we have to do with them in our use case?

To make replacement rules usable when we want to inspect whether a type is a subtype of an another type, a different semantic is given to replacement rules in such a case.

This semantic is as follows: each "add equation" statement is treated as an inspection of whether two nodes match; each "add inequation" statement is treated as an inspection of whether one node is a subtype of another; each report error statement is treated as "return false."

Consider the above replacement rule for function types:

replacement rule  FunctionType_subtypeOf_FunctionType

applicable for  concept = FunctionType as functionSubType <: concept = FunctionType as functionSuperType

rule {
   if ( functionSubType . parameterType . count != functionSuperType . parameterType . count ) {
      error " different parameter numbers " -> equationInfo . getNodeWithError (  ) ;
      return ;
   }
   functionSubType . returnType :<=:  functionSuperType . returnType ;
   foreach ( node <  > paramType1 : functionSubType . parameterType ; node <  > paramType2 : functionSuperType . parameterType ) {
       paramType2 :<=:  paramType1 ;
   }
}

In a different semantic, it will be treated as follows:

boolean result = true;
if ( functionSubType . parameterType . count != functionSuperType . parameterType . count ) {
      result = false;
      return result;
   }
   result = result && isSubtype( functionSubType . returnType <:  functionSuperType . returnType );
   foreach ( node <  > paramType1 : functionSubType . parameterType ; node <  > paramType2 : functionSuperType . parameterType ) {
       result = result && isSubtype (paramType2 <: paramType1) ;
   }
return result;

So, as we can see, another semantic is quite an intuitive mapping between creating equations/inequations and performing inspections.

Advanced features of typesystem language

Check-only inequations

Basically, inequations may affect nodes' types, for instance if one of the inequation part is a type variable, it may become a concrete type because of this inequation. But, sometimes one does not want a certain inequation to create types, only to check whether such an inequation is satisfied. We call such inequations check-only inequations. To mark an inequation as a check-only, one should go to this inequation's inspector and should set a flag "check-only" to "true". An "less or equals" sign for check-only inequation visually is gray, while for normal ones it is black, so you can see whether an inequation is check-only without looking at its inspector.

Dependencies

When writing a generator for a certain language (see generator), one may want to ask for a type of a certain node in generator queries. When generator generates a model, such a query will make typesystem engine do some typechecking to find out the type needed. Performing full typechecking of a node's containing root to obtain the node's type is expensive and almost always unnecessary. In most cases, the typechecker should check only the node given. In more difficult cases, obtaining the type of a given node may require checking its parent or maybe a further ancestor. The typechecking engine will check a given node if the computed type is not fully concrete (i.e. contains one or more type variables); then the typechecker will check the node's parent, and so on.

Sometimes there's an even more complex case: the type of a certain node being computed in isolation is fully concrete; and the type of the same node - in a certain environment - is fully concrete also but differs from the first one. In such a case, the abovementioned algorithm will break, returning the type of an node as if being isolated, which is not the correct type for the given node.

To solve this kind of problem, you can give some hints to the typechecker. Such hints are called dependencies - they express a fact that a node's type depends on some other node. Thus, when computing a type of a certain node during generation, if this node has some dependencies they will be checked also, so the node will be type-checked in an appropriate environment.

A dependency consists of a "target" concept (a concept of a node being checked, whose type depends on some other node), an optional "source" concept (a concept of another node on which a type of a node depends), and a query which returns dependencies for a node which is being checked, i.e. a query returns a node or a set of nodes.

For example, sometimes a type of a variable initializer should be checked with the enclosing variable declaration to obtain the correct type. A dependency which implements such a behavior may be written as follows:

target concept: Expression  find source: (targetNode)->JOIN(node< > | Set<node< >>) {
                                            if ( targetNode / . getRole_ (  ) . equals ( " initializer " ) ) {
                                               return  targetNode . parent ;
                                            }
                                            return  null ;
                                         }
source concept(optional): <auto>

That means the following: if the typechecker is asked for a type of a certain Expression during generation, it will check whether such an expression is of a role initializer, and if it is, will say that not only the given Expression but also its parent should be checked to get the correct type for the given Expression.

Previous Next

Labels

 
(None)