Table of Contents

Name

pyk - format for expressing logiweb page

Description

Pyk files are source files for the Pyk compiler (pyk(1) ).

Pyk is a language for expressing mathematics in a seminatural style.

To learn pyk, simply read the pyk source of the 'base' page at http://logiweb.eu/logiweb/page/base/fixed/vector/page.pyk. The comments in there give much more details than could reasonably be included here.

An overview is given in the following, however.

Comments

Comments start with one of the three-letter sequences ""{ and "";

Comments that start with ""{ can span any number of lines. They end at the first right brace encountered.

Comments that start with ""; end at the end of the line.

Strings

In their most simple form, strings have form "..." where the characters between the quotes make up the string.

Inside strings, however, various sequences of characters have a special meaning.

At any time while parsing the string, one particular character is the 'escape' character. The escape character is the ascii quotation mark, unless it is changed by an escape sequence.

While scanning the string, an escape character followed by a space marks the end of the string. Hence, "abc" denotes a three-letter-string since it ends with the escape character (a quotation mark) followed by a space. In this context, a space can be an ascii space character, an ascii newline character, a new comment, or the end of the file. An escape character followed by one of the eight characters ,.[]()<> also marks the end of the string. The character following the escape character is not 'consumed' so e.g. "ab"> is a two letter string followed by a 'greater than' sign.

An escape character followed by an exclamation mark denotes one occurrence of the escape character itself. Hence, "ab"!" denotes a three-letter-string consisting of an ascii small a, an ascii small b, and an ascii quotation mark.

An escape character followed by a question mark changes the escape character to be the character after the quotation mark. Hence, "ab"?+c+ denotes a three-letter-string consisting of a, b, and c.

An escape character followed by an ascii small n denotes an ascii newline character. (A newline character inside a string also denotes a newline character, so the newline character is available in two, different ways).

An escape character followed by a minus sign denotes no character.

An escape character followed by a plus sign makes the scanner skip spaces until a non-space is found. In this context, comments and newline characters function as spaces. Hence "ab"+ ""{this is a comment} c" denotes the three-letter-string consisting of a, b, and c. "ab"+ ""{comment} "-c" also denotes this three-letter-string. "ab"+ ""{comment} "- " denotes the three-letter-string consisting of an ascii small a, an ascii small b, and an ascii space character.

An escape character followed by a semicolon makes the scanner skip characters until and including the first end-of-line character.

An escape character followed by a left brace makes the scanner skip characters until and including the first right brace.

Double quote characters following an escape character are ignored. Hence, if the escape character is the double quote character, one may state comments inside strings as ""{comment} instead of "{comment}. Hence, as long as the escape character is the double quote character, one can state comments as ""{comment} both inside and outside strings.

An escape character followed by a slash makes the scanner scan characters until the next escape character. The characters A..Z, a..z, 0..9, -, and _ are translated to little endian, 6-bit sequences, all the sequences are concattenated, the concattenated sequence is sliced into 8-bit sequences (discarding dribble bits, if any), and the 8-bit sequences are interpreted little endian as bytes. This allows to include arbitrary binary data in strings. As an example, "ab"/AA AA IA"-c" translates into (97 98 0 0 0 8 99). Characters other than A..Z, a..z, 0..9, -, _, and the escape character are ignored.

Disambiguating Strings and Comments

A character sequence starting with ""{ is a comment even though it could have been interpreted as the beginning of a string. As an example, ""{abc} is a comment. ""-{abc}" is a five letter string containing two braces and three letters. ""-"{abc}" is the empty string.

Preprocessing

Pyk files are preprocessed before they are interpreted. During preprocessing, include directives are processed, and files are translated from external character encodings to Logiweb-UTF-8.

Logiweb-UTF-8 is identical to UTF-8 except (1) Code 10 is used as line separator, and (2) Codes 0-9, 11-31, and 127 are illegal.

Include directives have the following format:


   ""#include ( filename [ , filter ] )

Include directives start with a ""# sequence which makes it look like a string escape code. Include directives are processed before strings are parsed, however.

The filename argument and the optional filter argument of the include directive must be strings. These argument strings are parsed as described in the STRINGS section above. During preprocessing, include directives are replaced by the contents of the named files. The included files may themselves contain include directives. The optional filter argument defaults to the filter used for translating the including file. The following source includes file1 and file2. file1 is encoded in latin1 using character 10 as line separator. file2 is encoded the same way as the including file.


   ""#include ( file1 , "/latin1/newline/10" )
   ""#include ( file2 )

Include directives can be put anywhere: at the beginning of a line or in the middle of a line, and inside strings or outside strings. If an included file contains a double quote character and nothing else, then the included file may even start or end a string.

For a description of the filter argument see the filter option in pyk(1) .

Document Structure

The overall structure of a pyk document is:


page ::= name bib prio body
name ::= 'PAGE' pagename
bib  ::= 'BIBLIOGRAPHY' { reference . }
prio ::= {{'PREASSOCIATIVE'|'POSTASSOCIATIVE'} construct*}*
body ::= 'BODY' expression

The keywords PAGE, BIBLIOGRAPHY, PREASSOCIATIVE, POSTASSOCIATIVE, and BODY are the default keywords used by the pyk compiler for the five kinds of sections recognized by pyk. One can change the keywords by using the 'keyword' option of pyk(1) . This may be used e.g. to customize pyk(1) for a particular language.

For backward compatibility, periods can be replaced by commas in the bibliography, and the last period can be omitted.

the Page Section

The page section has form 'PAGE' pagename. The page name consists of all characters from 'PAGE' until but excluding the first newline character. The page name is not allowed to contain quote characters. Multiple space characters are treated as a single space character and leading and trailing space characters are ignored. Comments may occur before 'PAGE' but not in the pagename.

the Bibliography Section

A BIBLIOGRAPHY keyword followed by a PREASSOCIATIVE, POSTASSOCIATIVE, or BODY keyword denotes an empty bibliography.

A BIBLIOGRAPHY keyword followed by one or more references separated by commas denotes a non-empty bibliography.

Each reference has

form


reference ::= name ref

Each reference points to a Logiweb page. The 'name' must be a string and defines the name under which the given page can be referenced in associativity sections and the BODY section.

The ref can be a string or a 'kana' reference. Examples of references read:


BIBLIOGRAPHY
"ref 1"   nani
          niku tine neta kuse satu  natu nanu nasu kine tetu
          kena suku tutu teti seti  seku kuna sunu tete sena
          setu sike kase kisa sesa  sasi sunu nasa natu .
"ref 2"  "lgw:011E5334EB8606020AD376F0AE6675B5BEE0A277B0B69FCBD8B889A20806".
"ref 3"  "32:BQHGF2M5GUBEAFM2WD45KTVOVVPBOR6OQVN7ZFD3YNCFKEYAB".
"ref 4"  "64:B4xU0suhGIgCTbH8uaWd16L4ieHs2-5yYjbiiigBB".
"ref 5"  "http:base/latest/vector/page.lgw".
"ref 6"  "http://logiweb.eu/base/latest/vector/page.lgw".
"ref 7"  "file:page/base/vector/page.lgw".
"ref 8"  "file:/var/www/html/logiweb/page/base/vector/page.lgw".

Above, [1,2,3,4] are 'fixed' references and [5,6,7,8] are 'temporary' references. When pyk(1) looks up [1,2,3,4], it converts the given references into Logiweb references where a Logiweb reference is a byte vector which uniquely identifies a Logiweb page. Once pyk(1) knows the Logiweb references, it scans its cache for a match, and if no match if found, pyk(1) contacts the local Logiweb server logiweb(1) to locate the referenced page.

When pyk(1) looks up [5,6,7,8] it looks up the referenced page, extracts the Logiweb reference from the first few bytes of the reference page, and then proceeds as above. If pyk(1) does not find the referenced page in its cache, then it reads the entire reference given.

Reference [5] and [7] are relative references. Reference [7] is relative to the current directory and reference [5] is relative to the 'url' option of pyk(1) .

When pyk(1) looks up [5], it discards 'http:' and prepends the 'url' option. The url option may have form 'http:...' or 'file:...'. If the 'url' option has form 'file:...' then reference [5] eventually becomes a reference to the local file system. In the latter case, the reference is relative to the current directory if the path after 'file:' is relative.

the Associativity Sections

An associativity section consists of one of the keywords PREASSOCIATIVE or POSTASSOCIATIVE followed by one or more construct declarations. The page section and the associativity sections of a page define the priority and associativity of all constructs accessible on the page.


prio      ::= {{'PREASSOCIATIVE'|'POSTASSOCIATIVE'} construct*}*
construct ::= [id] page name

In a construct, the 'id', if given, must be a decimal number. The 'page' must be a string and the 'name' consists of all characters from the 'page' until the first newline character. As an example, consider the following page:


PAGE my page
BIBLIOGRAPHY
"ref 1"  "http:base/latest/vector/page.lgw"
PREASSOCIATIVE
"ref 1" " plus "
"" " + "
POSTASSOCIATIVE
1 "" " :: "

The page above defines a page named 'my page' and references a 'ref 1' page.

The associativity sections define two constructs, x+y and x::y. When introducing constructs, the double quote serves a place holder so that e.g. if(",",") and if"then"else" define a ternary if construct. In constructs, there are implicit spaces around double quotes. Multiple space characters are treated as a single space character and leading and trailing space characters are ignored.

The x+y construct is 'preassociative', i.e. 'left associative' in text that runs left to right, so that x+y+z means (x+y)+z. The x::y construct is postassociative so that x::y::z means x::(y::z). The x+y construct has higher priority than x::y because the associativity section containing x+y occurs before the one containing x::y. For that reason, x+y::u+v means (x+y)::(u+v).

The associativity section containing x+y also contains the 'x plus y' construct imported from the referenced page. For that reason, x+y gets the same priority as 'x plus y'. x+y also implicitly gets the same priority as all constructs on the referenced page (such as e.g. 'x minus y') which have the same priority as 'x plus y'.

The PAGE section serves as an associativity section so that 'my page' becomes a construct in an associativity section of its own. For that reason, the preamble above defines a page with three constructs: 'my page', x+y, and x::y.

The pyk(1) compiler assigns an id to each construct of a page. The construct mentioned in the PAGE section gets an id of 0. Constructs for which an id is given explicitly gets that id. As an example, x::y above gets an id of 1. Constructs for which no id is given explicitly are numbered in the order they occur, avoiding numbers that are assigned explicitly. For that reason, x+y above gets an id of 2.

Each construct of each associativity section has form '[id] page name'. Constructs whose 'page' is the empty string are exported from the page being defined. Constructs whose 'page' is non-empty are imported from the given page. In the latter case, the 'page' must be equal to name given to the referenced page in the BIBLIOGRAPHY section.

If an id is given for an imported construct, then only the construct with the given id of the given page is imported. Furthermore, that construct is renamed to whatever name is given. As an example, if 'x plus y' of the referenced page has an id of 1, then


1 "ref 1" " ++ "

imports the 'x plus y' construct and renames it to x++y. When no id is given, all constructs with the same priority as the given construct are imported from the given page. Note that each page has its own assignment of id's. For that reason, it is no problem that the 'x plus y' construct of the referenced page and the x::y construct of "my page" both have an id of 1.

Each defined and each imported construct has two names. One of the names is the one described above. The other is a 'page qualified' name. As an example, the x+y construct has the following page qualified name: 'x my page + y'. The 'x plus y' construct has the page qualified name 'x ref 1 plus y'. For constructs that start with a double quote, the page qualified name is constructed by adding the page name after the first double quote. For other constructs, the page qualified name is constructed by adding the page name in front of the construct so that e.g. 'if x then y else z' becomes 'my page if x then y else z'.

Since 'my page' is itself a construct, it also has a page qualified name: 'my page my page'. But one cannot use 'my page my page' for qualifying names so 'if x then y else z' cannot be written 'my page my page if x then y else z'.

the Body Section

The body section is by far the largest. It contains an expression built up from constructs introduced in the ASSOCIATIVITY sections and constructs introduced on directly referenced Logiweb pages.

In the body section, multiple spaces count as a single space. Spaces must be present in the body iff they are present in a construct so that e.g. 'my page :: my page' can neither be written 'my page::my page' nor 'my page : : my page'.

As an exception, when '[' or ',' occur after a string, an implicit space is inserted between the string and the given character so that "abc"[ my page ]"def" means "abc" [ my page ] "def".

the Name of the Game

The name "pyk" is constructed from the name "Volapyk" in the same way that Rene Thom construct the word "versal" from "universal": "pyk" is constructed by removing "Vola" from "Volapyk".

Volapyk was an artificial language constructed from several other languages by simplifying their words and their grammar. As an example, the name of the language itself is constructed from "Vola" which is a simplification of "World" and "pyk" which is a simplification of "speak".

The pyk language may be used for "spoken mathematics" and may, among other, be entered through a microphone when editing mathematical text.

Author

Klaus Grue, http://logiweb.eu/

See Also

pyk(1) , http://logiweb.eu/logiweb/page/base/fixed/vector/page.pyk


Table of Contents