Compiler construction

Florian Rappl, Department of Theoretical Physics, University of Regensburg

Compiler construction

a tedious task with benefits

Outlook

  • What ?
  • Why ?
  • How ?
  • Examples !

What does a compiler do?

Domo compiler

Divisions

  • Frontend
    • User interaction
    • Input handling
  • Backend
    • Doing the hard stuff
    • No interaction with the user

Frontend

Compiler Frontend

Frontend parts

  • Scanner
  • Tokenizer (Lexical Analysis)
  • Preprocessing (Macros)
  • Parser (Syntax Analysis)
  • Validator (Semantical Analysis)

Backend

Compiler Backend

Possible modules

  • Analyzer
  • Optimizer
  • Code generator
  • Linker
  • Runtime

Abstract Syntax Tree

  • Primary goal: Produce representation of code
  • Code might be represented as an AST
  • This tree can be modified and searched
  • In the end it can be used for output generation
  • Sometimes open for modification

Statements and expressions

  • Statements are closed blocks
  • Expressions are open for connections
  • Most statements require expressions
  • Expressions carry instructions
  • Statements are responsible for logic

A trivial example

Abstract Syntax Tree

A more complex example

Abstract Syntax Tree

Operator precendence

  • Hierarchy required
  • Makes trivial parsing impossible
  • Reverse Polish notation
  • Needs to consider brackets
  • Left-To-Right or Right-To-Left?

Kinds

  • One pass compilers (e.g. Pascal)
  • Multi pass compilers (e.g. Java)
  • Load and go compiler (e.g. D. Basic)
  • Optimizing compilers (e.g. C)
  • JIT compilers (e.g. MSIL)
  • Transpiler (e.g. TypeScript)

Classical compiler scheme

Classical Compiler Scheme

Hybrid compiler scheme

Hybrid Compiler Scheme

Usage

  • Implement scripting
  • Improve flexibility
  • Handle input
  • Import data
  • Generate code
  • Increase efficiency
Compiler NoApp

Creating a parser

  • What's the goal?
    • Individual specification
    • Existing specification
    • Extending existing languages
  • What's the purpose?
  • What's the audience?

Your own language, no more ...

Computer languages

Extending existing languages

C versus C++ No Class

Parsing techniques

  • Simple precedence parser
  • Recursive descent parser
  • Pratt parser
  • Operator-precedence parser (e.g. Precedence climbing)

Precedence climbing

Precedence climbing

Parser generators

  • Some programs generate lexers / parsers
  • e.g. Yacc, Bison, ANTLR, ...
  • They have some advantages, but introduce some problems
  • Main problem: Dependency on the generator

Programming language

  • Any language can be used
  • OOP languages certainly have advantages
  • Performance for the compiler does usually not matter much
  • More important is testability, robustness and flexibility
  • C++, Java and C# offer a lot of possibilities

Compiler performance

Compiler Performance Joke

Possible applications

  • A simple document language that transpiles to LaTeX
  • Describing data formats efficiently
  • A straight forward scripting language
  • Code generator for e.g. some specialized HPC application

Example: AngleSharp

Anglesharp Logo
  • HTML5 parser
  • CSS3 parser
  • Builds the DOM for HTML documents
  • This representation has an API
  • Can be used for e.g. transforming HTML to LaTeX

Example: Mustang

Mustang Logo
  • A numerical programming language
  • Loosly based on YAMP
  • Compiles to MSIL
  • External types (e.g. MKL) can be injected
  • Perfect for scripting applications (e.g. analysis)
Dennis Ritchie
At least for the people who send me mail about a new language that they're designing, the general advice is: do it to learn about how to write a compiler.

Dennis Ritchie