Diagrams, Visual Languages, and Spatial Parsing

In a diagram, each text graphic object has a spatial location independent from its pattern membership, but usually related in conventional ways. Those conventions are a part of the visual language for that kind of diagram.

Binary trees and bar charts are examples of visual language conventions which relate spatial arrangement to underlying tree structure. A spatial parser has been written which can process these languages (among others). The parser takes two inputs: a machine readable visual grammar for the target visual language, and a region on the screen in which atomic elements from an expression in the target language are spatially arranged according to the conventions of that language. Using the former it then parses the latter, returning a complex pattern in which copies of the visual atoms from the region are arranged in the structure designated by the grammar.

Thus a visual grammar along with a spatial parser is one way of formalizing the conventions for relating spatial arrangement to tree structure in diagrams.

The purpose of spatial parsing is to aid in the processing of visual languages. As an operational definition of visual language, we say: A visual language is a set of spatial arrangements of text-graphic objects with a semantic interpretation that is used in carrying out communicative actions in the world.*

Spatial parsing deals with the spatial arrangement of the text graphic objects in a visual phrase from a visual language: Spatial parsing is the process of recovering the underlying syntactic structure of an expression in a visual language from its spatial arrangement.**

Diagrams as Formal Visual Languages

When a person employs text and graphic objects in communication, those objects have meaning under a system of interpretation, or visual language. Formal visual languages are ones which have been explicitly designed to be syntactically and semantically unambiguous. It seems that most diagraming systems would fall under this definition, especially those with published conventions or public agreement on them. A gray area could be called folk diagrams, which in use are unambiguous, but for which there are no explicit conventions.

And the remainder of the artifacts produced by human text graphic activity      such as found on blackboards and city walls      are here considered not to be diagrams but rather occurences of informal conversational text graphics. Such images, however, may contain diagrams as embedded parts (and skillful visual linguists may on occasion discover regular underlying structure, thus changing the status of some class of text graphics from informal to formal).


Spatial parsing recovers underlying syntactic structure so that a spatial arrangement of visual objects can be interpreted as a phrase in a particular visual language. Interpretation consists of parsing and then semantic processing so that appropriate action can be taken in response to the visual phrase. Appropriate action may include: assistance for agile manual manipulation of objects, compilation into an internal form representing the semantics, translation into another text-graphic language, or simply execution as an instruction to the computer.

Semantics is discussed in further detail in Spatial Parsing for Visual Languages and Visual Grammars For Visual Languages.

* Note that if we substitute ``strings of textual symbols'' for ``spatial arrangements of text graphic objects'' we have something strikingly similar to a characterization of written natural language. Interestingly, the text graphic definition includes the textual one. A paragraph of text is one kind of arrangement of text graphic objects.

** Again the definition is parallel to one for textual parsing: textual parsing is the (rule governed) process of recovering the underlying (in the sense that it was used to generate the linear form) syntactic structure from the linear form of a sentence.

  Back to first page