The soul of the beast

Everything about Python's grammar

Pablo Salgado

Abstractions CPython Performance python

# Why

The audience will discover one of the core pieces of the language that sits at the middle of the decisions about what new rules can or cannot be implemented in the Python programming language. They will learn how the particularities of the grammar limit what can be achieved but also serve to maintain the language consistent, powerful but straightforward. Attendants will learn how core developers solved some challenging scenarios that arise as a consequence of said limitations or how others cannot be resolved unless Python gets a significant transformation in the internal mechanism that parses the grammar. Finally, they will learn how a new rule is added to the CPython grammar, serving as a perfect example of how all the pieces come together. In summary, the audience will gain a more technical response to why people perceive the Python programming language as easy but powerful one and at the same time will gain some insight on how to understand and extend the pieces that form it. This talk will not only help members of the audience understand better the design of the language a how grammars and parser work, but will also help people wanting to contribute to CPython understanding the general structure of the compiler pipeline and how to work on it.

# Who

This talk is for those that want to understand Python a bit deeper: not only how everything works under the hood but also what are the technical decisions in its making and what are the consequences. The talk is targeted to all Python programmers, no matter the skill level as everyone will find something for their particular level of expertise:

- Beginner programmers will be introduced in the topic of language grammars and will learn what a Grammar is and what are the building blocks. Also, the audience members in this level will gain insight into how everything is thread together in CPython.

- Medium and advanced programmers will learn some in-depth technical details and how they relate to features they already know and understand. The talk not only will try to enlight some new areas related to grammar technicalities, parser features and design and CPython implementation details but will also connect many pieces of information to explain how the small technical decisions impact the bigger picture.

#Outline

Who am I

What is the Python Grammar

- What is grammar?
- How they look like.
- Elements: terminal symbols, nonterminal symbols, productions.

The properties of Python Grammar?

- Leftmost derivation
- 1 token lookahead
- No epsilon productions! (Plus what epsilon productions are)
- Some immediate consequences of these properties.

How the Python parser generator works

- General structure of the parser generator.
- Non Deterministic Finite Automata
- Deterministic Finite Automata.
- Some examples (with cool graphs!) generated from the
python grammar and the parser generator of the actual
finite automatas that Python uses.
- Concrete syntax trees.

Advantages of the grammar (or "why Python is so easy to understand)

- LL(1) grammars are context-free (no state to maintain while parsing).
- LL(1) grammars are simple to implement and very fast to parse.
- LL(1) grammars are very limited, keeping the language simple

Disadvantages of the grammar:

- Grammar ambiguity.
- LL(1) grammars need some hacks for very simple things.
- How keyword arguments were incorporated in the grammar with a hack:
The grammar rule is very strange because it is "fixed" in the Abstract syntax tree
- Why parenthesized with statements cannot be implemented (with statements
formed of multiple elements surrounded by parenthesis and separated by commas).

Implementing a new grammar rule in CPython: the arrow operator :

- A complete mini-tutorial on how to introduce a new operator: A -> B
that gets executed as A.__rarrow__(B).
- Altering the grammar and generating the new parser.
- Introducing a new token.
- Changing the tokenizer.
- Changing the Abstract Syntax Tree Generator.
- Changing the compiler.
- Implementing the new opcode.
- Implementing the __rarrow__ protocol.

The future and summary of the talk:

- We have been discussing in the CPython discourse to change the parser generator to something
more powerful.
- Dangers and advantages of other parser generators.
- What other implementations are using?
- Summary of the talk

Type: Talk (45 mins); Python level: Advanced; Domain level: Advanced

Pablo Salgado

Bloomberg LP

I'm a Software Developer at Bloomberg LP where I work in the Python infrastructure team defining how Python is used internally and providing critical infrastructure used by all Python developers at the company. I am also a physicist specialized in general relativity and differential geometry where almost all my past research focused on rotating black holes. I am a young Spaniard coder wishing to learn from everyone as much as possible and take advantage of any chance of networking.