The graphics generated were intended for high-quality reproduction in the book. Since we wanted them to consist primarily of typography and colored areas, we selected a vector- rather than a pixel-based format. In addition, as an output file, the format was to be written by means of an algorithm to ensure a fully automated process. This made it essential to work with an ”open” format that had a well-known internal syntax and could be easily processed by book layout programs [ad.03]. The Scalable Vector Graphics format of the W3 Consortium, which was defined with the support of Adobe and other companies, fulfills these requirements. As an added advantage, it is readable by human beings in a form similar to XML.
An additional demand on the generation process was that it should permit quick and flexible alterations to the rules of graphics generation. This is why we selected Macromedia Director as an authoring tool with an interpreted programming language [ad.07] in order to implement the algorithm. Macromedia Director allows for direct script editing without recompilation and features functions for manipulating texts, words and characters. Further, the authoring tool makes possible an uncomplicated display of graphics within the program in addition to output of the graphics as a text file in SVG format.
The graphics generation process unfolds sequentially in a number of stages. It is partially repeated for each re-generation of the same graphic. It starts by reading and ”cleaning” the text – that is, by converting or deleting blank paragraphs, blank characters, special characters, etc. The text processed in this way is then filtered word by word (see ”stop word list” in "Data selection in the present case") and evaluated, yielding a table that contains all words that appear in the essay, plus their frequency. Data preparation only needs to be performed once per essay, and this information can then be stored as an intermediary result. At this point it is possible to compute how many words have a filler function, how many crop up several times and how many stand alone in the essay collection. Standardized values can be derived for the entire length of the essay.
In a second stage, correlations between the essay in question and other essays are determined by calculating the "secondary author". Both the "secondary author" and relative frequencies are calculated for every word in order to identify which of the two authors and which essay attaches "greatest importance" to the word.
This is followed by geometric computations that convert the captured data into graphic structures (in our case primarily circles). The position of the circles’ center, as well as radius length, filler colors, typeface color and circle transparency vary depending on data.
The graphic structures are expressed in the descriptive language SVG and written into an output file. Once all the descriptions for a content graphic have been pieced together, the file is closed and transferred to a browser program with a software component for the display of SVG files [ad.08]. This component then converts the description into visual output.
The authoring tool Director also has a display function for vector-based graphics that permits direct display in the system and that would presumably have simplified the developmental process through simpler manipulation during graphics generation. However, we decided to forego this alternative output for reasons of time.
Our content graphics consist of three levels, each of which represents different information.
The background of each graphic classifies the essay in question within the context of all other essays in the book.
The concentric circles represent the individual essays, with the radius in direct proportion to the essay’s character count.
In addition, the essays are depicted as a pie slice that has been color-coded according to Johannes Itten's color circle and whose width of the central angle derives from the number of coinciding words between the essay in question and that of the secondary author.
The radius of the pie slice also derives from the length of the individual essays (the same is true of the concentric circles) (pic.02). In the context of the book, the essays have been arranged so that the first essay appears at the top of the circle and the others follow in a clockwise direction.
The radius of the white space in the middle of the pie slices is relative to the length of the other essays (see pic.01). Both the distance between the center of the white space and its radius and the distance between its radius and the outermost concentric circle stand in direct proportion to the relationship between the length of the essay in question and that of the longest essay in the book. A short essay is thus represented by a small amount of central white space, a long essay by a large amount. The slices thus have ”lengths” corresponding to each of the essays. To illustrate the color code, we have given each slice a solid-colored outer edge that runs along the concentric circle allotted to the corresponding essay.
The barcode structure which appears within the concentric circle of the essay in question breaks down its structure.
Both the central angle and transparency of each bar correspond to the length of the essay’s paragraphs. A long paragraph in the text is depicted by a light shade (i.e. a high degree of transparency) and a wide central angle. The white space at the end of the barcode ring stands for the amount of source information and footnotes at the end of a text [ad.09]. A large amount of white space in the barcode ring indicates an essay with a lot of footnotes and source information.
The so-called word circles appear in the foreground of the graphic and represent a selection of the most frequently used words in the essay (pic.04). A word's frequency (and, to some extent, its importance) is proportional to both its type size and the diameter of its circle. The word circles' color is the basic color assigned to the author according to the color circle and the essay's position in the book. We have used a transparent shade to permit highlights and accents when different word circles overlap.
On the one hand, the word circles are placed in different pie slices to indicate which secondary author used the word most frequently. On the other, their position on the radial axis between the author in question (in the center) and the secondary author (on the edge) reflects the frequency ratio between both essays: if the word is more "important" to the secondary author than to the author of the essay in question, the word circle appears closer to the secondary author - and vice versa.
Word circles in the center of the graphic are unallied and have no secondary author. The type color is gray and their radial position in the white space reflects their frequency in the text. Frequently appearing words are positioned in the center, seldom used words along the edge. The word circles’ central angles have been determined according to the position of the word’s first letter in the alphabet. In each pie slice, words beginning with a, b, c, come first and the rest run clockwise.
Our goal was to create content graphics for the essays using a process marked by a high degree of automation and based on word frequency. In addition to content, we paid close attention to the (subjective) aesthetics of representation. In the developmental process described, we repeatedly considered the reader’s subjective impression of the graphics, fine-tuning the rules used to convert collected data into graphical structures. Our objective was to allow readers intuitively to form thematic conclusions about the essay from the informational graphics.
next page...