|Logiweb system pages|
|The Logiweb protocol|
Logiweb servers communicate with each other and with Logiweb clients using 'Logiweb messages'.
A Logiweb message is a sequence of bytes, i.e. a sequence of bits whose length is a multiple of eight. On this page, messages are written as sequences of bits using mixed endian notation.
In the following, the syntax of messages is presented bottom up, meaning that the actual definition of the syntax of a messages comes last.
Table of contents
Cardinals and bit vectors
A byte denotes a value in 0..255. As an example, mixed endian 0000 0010 denotes 2.
A middle or end septet denotes a value in 0..127. As examples, the middle septet 1000 0010 and the end septet 0000 0010 both denote 2.
In this context, a 'Cardinal' is a non-negative integer. Cardinals are expressed little endian base 128. As an example,
1000 0011 0000 0010
denotes 3+128*2 = 3+256 = 259.
A 'vector' encodes a sequence of bits. The 'length' field of the vector indicates the number of bits in the vector. The 'bytes' field of the vector contains the actual bits, padded with zero to seven bits to make the length of the the 'bytes' field divisible by eight. As an example, consider the vector
<1,1,0,0, 0,0,0,0, 1,1,0,0, 0,1,0,0>In mixed endian binary (each byte written in 'reverse' order with the MSB first and LSB last), the vector reads:
0000 0011 0010 0011
The vector above encode the three-bit sequence <1,1,0>. The length field contains binary 3. This is easiest to see in the mixed endian notation. The bytes field contains the three bits <1,1,0> followed by garbage. This is easiest to see in the bit-vector notation.
The grammar of messages is *not* context free because the 'vector' syntax class cannot be expressed in a context free manner.
Each non-negative integer can be represented in more than one way. As an example, the syntax class x05 covers all bit-vectors that represent the number five.
Identifiers are cardinals used for identifying something. When parsing an identifier one may choose to parse it as a cardinal, convert it to a number, and then do further processing based on that number.
At the time of writing, the Logiweb message protocol defines eight kinds of messages. These eight kinds of messages are named 'nop','event', 'ping', 'pong', 'get', 'got', 'put', and 'prefix'; they are identified by the identifiers above.
The Logiweb identifier is a cardinal that a Logiweb server places in pong responses in order to identify itself.
When a Logiweb server receives a Logiweb message from some client, it responds to that client with zero or one message. A Logiweb server never sends out more than one message in response to a single incomming message. This convention is made to avoid abuse of Logiweb servers for denial of service attacks.
When and if a Logiweb server responds to an incomming message, it occasionally responds using an 'event' response, where the 'event' response contains one 'event' out of the three possible events above.
A Logiweb server responds with a 'received' event when it wants to tell a client that it has received one message from that client. When a Logiweb server uses this response, it does not say anything about what it did to the incomming message. One reason for not saying anything could be that a more informative answer could be a give-away to malicious clients.
'Received' events are useful when using datagram protocols like UDP for carrying Logiweb messages. When a client receives a 'received' event it knows that the UDP packet found its way to the server and does not have to be retransmitted.
If a Logiweb server responds with a 'received' event to a message that is labelled by a prefix, then the received event will be labelled by the same prefix to allow identification (c.f. the treatment of prefix messages later).
A Logiweb server responds with a 'rejected' event when it wants to tell a client that it has received one message from that client and that it is never ever going to process it (so the client can spare resending it). A 'rejected' event typically means that the implementer of the client has made a bug such as letting the client send malformed messages.
A Logiweb server responds with a 'sorry' event when it wants to tell a client that it has received one message from that client but was unable to process it at this time. As an example, if a (possibly malicious) client starts sending billions of requests that require a response that is longer than the request, then the outgoing connection from the Logiweb server may saturate. In this case the Logiweb server may start responding with 'sorry' events since 'sorry' events are only two bytes long.
When requesting information from a Logiweb server, the kind of information requested is identified by a 'class identifier'. At the time of writing, one may request six classes of information, the most important one being the 'url' class. The other classes are useful for Logiweb servers that cross index or mirror each other.
One may request a Logiweb server to 'add' or 'remove' information from the internal state of the server. The most important use is to 'add' an association from a particular reference to a particular url or to 'remove' such an association.
Needless to say, Logiweb servers must be suspicious towards such requests and must have a strategy for rejecting malicious or false requests for update of the data base. A Logiweb server should always respond with a 'received' message to a request for changing the state as any response that is more informative could be a give-away to malicious clients.
A reference consists of a version, a RIPEMD-160 hash key, and a time stamp. The version is a cardinal and must be equal to one. The RIPEMD-160 hash key contains 20 bytes (160 bits). The timestamp indicates the time of publication expressed in Logiweb time, i.e. as the number of seconds that have elapsed since International Atomic Time (TAI) 00:00:00 on Modified Julian Day (MJD) 0. The number of seconds is expressed on the form m*10^(-e) where m and e are cardinals.
A Logiweb vector is the format in which Logiweb pages are stored on disk and transmitted over networks.
The syntax for Logiweb vectors is irrelevant to the Logiweb protocol, but is included here because it just takes up a few lines in the present context.
A bibliography is a sequence of references consisting of at least one reference. Reference zero (the first reference) is the reference of the page itself. The key of reference zero must be the RIPEMD-160 hash key for all the bytes following the key (i.e. the timestamp of reference zero, the remainder of the bibliography, and the bytes following the bibliography).
A dictionary is an association list which maps indices to arities. The dictionary introduces a list of symbols where a symbol is a pair (r,i) of cardinals. Each entry (i,a) in the dictionary introduces the symbol (r,i) where r is the reference cardinal of the page, i.e. the first reference in the bibliography converted to a cardinal. Furthermore, each entry (i,a) assigns arity a to the symbol (r,i).
If the bibliography has more than one element, then the Logiweb page referenced by reference number 1 (the second element in the list) is termed the 'bed' page. The bed page may specify how to unpack a body from a sequence of bytes to a tree structure. If it does not, then the body must have the format specified by default-format
A flat-tree is a Polish prefix representation of a body. Each cardinal in the flat-tree represents one symbol (r,i) as follows: each cardinal has value i*n+j+1 where i is the index of the represented symbol, n is the number of elements of the bibliography, and the reference of the symbol is reference number j in the bibliography.
Unpacking the Polish prefix representation into a tree requires knowledge of all arities of all symbols, i.e. knowledge of the dictionary of all pages mentioned in the bibliography.
Since symbols are represented by i*n+j+1 where i, n, and j are cardinals, the number zero does not represent a symbol. If the number zero occurs in a flat-tree, then the cardinal following the zero is a length field which defines a cardinal B, and the next B bytes constitute a string of bytes to be included as a string in the body.
These four syntax classes are just aliases for cardinals and vectors; they are introduced to enhance the readability of the definitions below.
A nop (No Operation) request asks a server to do nothing. A server does not respond to nop messages. Nop messages may be used for padding when using a connection based protocol for carrying Logiweb messages.
Event responses are treated above in connection with the definition of the 'event' syntax class.
A ping request asks 'who are you, and what time is it'. The associated pong response answers the first question using the Logiweb identifier and answers the second question by a timestamp. When mirroring a Logiweb server it is reasonable to start asking what time it is and then querying all information residing on the server. Later on, one may then do incremental mirroring by asking for all information that has changed since last.
A prefix message attaches a label to a message. When a server receives a message with a label, it copies the label into the response. This is useful when a client needs to know which responses match which requests. Requests tend to be self-contained so, in general, one does not need to have the request to understand the response. But occasionally it may be convenient or even necessary to be able to match requests and responses.
The prefix message is recursively defined so that one may attach an arbitrary number of labels to a message. Servers are expected to handle messages that are up to 65535 bytes long as a minimum. If the buffers of the server is exhausted by a single ingoing or outgoing message (e.g. because of too many labels), then the server is expected to respond with a 'rejected' message.
The ability to attach labels recursively may be used for relays that let Logiweb messages pass through a firewall since each relay may want to add a label to each incomming message and strip it off the associated response.
The get and got messages allows a client to query the state of the server.
The state of the server associates values to addresses where values as well as addresses are bit vectors. For each address, the state may store values of several classes (at present, the classes are 'update', 'type', 'left', 'right', 'sibling', and 'url', c.f. the definition of 'class identifiers' above).
Furthermore, for each address and class, the state may associate more than one value, in which case each value is identified by an 'index' which is a cardinal. The indexing is chronological such that the value with index 1 is the oldest, the one with index 2 is the second oldest and so on. When requesting a value with index zero or an index larger than the number of values on store, the Logiweb server provides the newest available value.
A get message contains an address, a class, and an index, and asks the server to provide the associated value. If a Logiweb server is too pressed to respond to a get message, it should send a 'sorry' event response. Otherwise, it should respond with a got message.
A got message contains many fields. First, it contains the address, class, and index of the get message.
If the state of the server did contain values for the given address and class, then the got message contains the following: The 'length' field is a copy of the 'length' field of the given 'address'. The 'attribute-count' field contains the total number of attributes with the given address and class. The 'value' field contains the returned value. The 'timestamp' field indicates the time at which that value entered the state of the server. If the requested index is between one and attribute-count, inclusive, then the returned value is the n'th-oldest value on the given address of the given class where n is the value of the given 'index'. Otherwise, the returned value is the newest value on store.
If the state of the server did not contain values for the given address and class, but the server did contain a node for the given address, then the got message contains the following: The 'length' field is a copy of the 'length' field of the given 'address'. The 'attribute-count' field is zero. The 'value' field contain the empty bit-vector. The 'timestamp' field is the current server time.
If the state did not contain a node for the given address, then the server finds the longest prefix of the address for which the server does have a node. Call that node the 'closest node'. In that case, the got message contains the following: The 'length' field is the length of the prefix. The 'attribute-count' field is the number of sibling attributes of the closest node. If the closest node has no sibling attributes, then the 'value' field is the empty bit-vector and the 'timestamp' field contains the current server time. Otherwise, the 'value' field contains a random among the sibling attributes and the 'timestamp' field contains the timestamp of that attribute.
In the last possibility above, the got message is used to redirect the client to another Logiweb server. When being redirected from server to server, the 'length' field should be strictly increasing for each redirection.
A put message suggests that the server should consider to add or remove an association from the given address and class to the given value. What the server does to such suggestions is highly implementation dependent.