*Gordon Kurtenbach is now with Alias Research Inc., 110 Richmond Street East, Toronto, Canada M5C 1P1.
Computational systems are typically ill-suited to such situations, because they
force users to create and deal with more-or-less formalized representations.
The
overhead of using such representations inhibits the very processes they are
meant to support [7, 12,
13].
One of the big challenges for current HCI design is to
create systems for informal interaction.
Pen-based systems that allow scribbling on wall-size displays or notepads can
support whiteboard or shared notebook metaphors for interacting with informally
scribbled material. The free, easy, and familiar expression permitted by such
systems make them a promising class of tools to support informal interaction.
Our base tool is a large, shared, pen-based electronic display device called the
LiveBoard [4]. We have developed a software system,
called Tivoli [9], that simulates whiteboard functionality
on the LiveBoard.(fn1) (There is a commercial
version of Tivoli called MeetingBoard.(fn2))
This paper presents and discusses a new scheme that we have designed and
implemented in Tivoli to extend its editing power, while yet remaining simple,
natural, and consistent with the informal nature of the tool.
This paper begins by proposing the notion of "freeform interaction" to
help pin down what we mean by "informal." Then we describe the
extended editing scheme, which is based on the system perceiving the
"implicit structure" that humans see in the material.
This is followed
by a discussion of the design principles, trade-offs, limitations, and
comparison to other systems.
Abstract
This paper presents a scheme for extending an informal, pen-based whiteboard
system (Tivoli on the Xerox LiveBoard) to provide a structured editing
capability without violating its free expression and ease of use. The scheme
supports list, text, table, and outline structures over handwritten scribbles
and typed text. The scheme is based on the system temporarily perceiving the
"implicit structure" that humans see in the material, which is called
a WYPIWYG (What You Perceive Is What You Get) capability. The design
techniques, principles, trade-offs, and limitations of the scheme are
discussed. A notion of "freeform interaction" is proposed to position
the system with respect to current user interface techniques.Keywords:
freeform interaction, implicit structure, pen-based systems,
scribbling, whiteboard metaphor, informal systems, recognition-based systems,
perceptual support, list structures, gestural interfaces, user interface
design.Introduction
Our goal is to create computational support for the informal collaborative
processes of small groups working together in real time. We are concerned
especially with "generative" tasks (creating and assessing new ideas
and perspectives, discussing them, playing with them, organizing them,
negotiating about them, and so on). Human interaction in such situations is
informal, freewheeling, rapid, and subtle.FREEFORM INTERACTION
The notion of "informal interaction" is somewhat vague, and so we
define a more operational notion. A graphical editing system allows a user to
manipulate graphical objects (GOs) that have defined positions in a 2-D space.
A
free graphical object (freeGO) is a GO that has no constraints or
structural relations with other GOs; it can be freely operated upon
independently of any other GOs in the space. Any kind of GO -- such as an ink
stroke, a text character, an icon, or a composite GO -- can be a freeGO.
Typical
operations are drawing, erasing, wiping, dragging, and gesturing (for both
selecting and operating). A representation consisting solely of freeGOs is a
freeform representation. The unconstrained interaction enabled with
such
a representation is freeform interaction.(fn3)
Scribbling is a prime example of freeform interaction: In scribbling, strokes can be created (drawn) anywhere without affecting existing strokes. Any strokes can be changed or erased without affecting any other strokes.(fn4)
In contrast, traditional text editing is not freeform, because there is an underlying string structure among the characters; e.g., deleting a character affects the positions of all characters later in the string. To be freeform, characters would all have to be freeGOs, i.e., have no underlying string structure. Erasing some characters would not cause any other characters to move. Such a seemingly limited model of text would have a crucial advantage: characters and strokes could be freely intermixed.
Such structure is implicit in the sense that it is perceived by the user but not by the system, because it is not defined or declared to the system. Keeping structure implicit is the essence of freeform interaction. The benefit is that users have the freedom to treat the material any way they want at any time; the cost is that the system cannot take advantage of the implicit structure in supporting the users' operations. Therefore, we would like the system to automatically perceive structure in the material in order to support a WYPIWYG (What You Perceive Is What You Get) capability [11]. But it is crucial to have the system perceive structure in the material only when the user needs support, and to keep the interaction freeform otherwise.
Our experience with LiveBoards and whiteboards is that list-like structures are ubiquitous. Thus, we set out to support the manipulation of four kinds of list-related structures:
The general design technique is to embed ephemeral perceptual support within freeform interaction: Whenever the user takes an action that implies a structural interpretation, the system temporarily perceives the structure in the material, carries out the current operation according to the expected behavior of that structure, and then returns to freeform interaction. Before discussing specific design techniques, we illustrate how our scheme works.
Wedge and caret gestures indicate whether to insert a selection as an item or as text. In the case of dragging, the type of insertion is determined by where the selection is dragged to. If it is dragged to the gap between two lines, it is inserted as a list item; if it is dragged to a point within an item, then it is inserted as text.(fn7)
We might argue that these transitional actions are costless for the user, since the user has to do them in any case. But, before making a selection the user must decide whether to make a structural or a freeform selection. The cost is mental (a choice has to be made [2]); and occasionally users make errors (e.g., selecting freeform but expecting a structural move to occur). We feel the benefits outweigh the costs. (fn9)
There is a uniform set of structural selection gestures, brackets and L-shaped gestures (Figure 6). These gestures work by projecting from their legs to define a rectangular region. This reduces structural selection to simple geometry, i.e., defining a rectangle. The user does not have to commit to a particular structure when a selection is made; the structure is not determined until an operation on the selection is invoked. For example, when an item in a list is selected, it can be either moved to another position in the list (with space opened vertically to make room for it) or to a place within another item (with space opened horizontally to make room for it).
Character freeGOs can be made to behave like text by invoking implicit structures. Structurally selecting a horizontal sequence of characters and then moving them will cause space to be opened and closed in a text-like manner. Typing is treated as an implicit structure operation. Making an ink dot on the display creates a type-in point; if the dot is near some characters, it will "snap" into a position so that the typed characters align with existing characters. If there are characters immediately to the right of the type-in point, they are moved to the right to accommodate the newly-typed characters.(fn11)
This situation is handled by the concept of borders. Very long strokes are considered to be borders that divide the display into regions. Borders delimit structures. Thus, structure operations stay within the confines of borders. In the example above, if a vertical stroke is drawn between the list and the sketch, then operations on the list are confined to the left side of the border, and the sketch will not be disturbed.
The most common use of borders is to divide the display into columnar regions for multiple lists. Structural operations occur independently in the different regions. For example, an item can be moved from one list to another across borders. Figure 7 shows two lists with an item selected in the left list. When the user moves the item to the right list, it will be inserted there; and the resulting opening and closing of spaces will be confined by the border so they don't interfere with each other.(fn12)
To deal with this problem, we provide cleanup operations, which neaten up the alignment of material. In carrying out a cleanup operation, the system must decide whether elements are aligned or not. It is not so important what the system decides; what is important is that it makes the system's perception clear to the user. The user can then adjust those elements that were misperceived.
For example, the horizontal cleanup operation is useful for tables. Consider the table in Figure 8a, which is taken from our user test data. The user created the table column-wise, and hence the rows are not well-aligned horizontally. Note that it is impossible to select the first row of the table because it dips and the elements are crowded. The horizontal cleanup operation analyzes a table column by column, identifying the items in each column, finds correspondences between the items in the different columns, and decides what is in each row; it then respaces the elements to make the spacing between rows clear. The result in this example is shown in Figure 8b, where it can be seen that the first row is now easily selected.
At first we implemented a "lined paper" method for handling lists (which is what other pen-based systems, such as [1,8], do). We set aside that scheme when we were satisfied that we could recognize lists on a blank surface. It took almost a year of iteration and refinement, with lots of testing within the Tivoli group before we had a version without "obvious" flaws. There were many design issues at this point that needed empirical evidence to help us address.
We conducted a small set of tests with independent users. In each session of about an hour, a user was trained until he/she was confident; then the user was given a range of tasks by playing the role of "scribe" in a simulated meeting. It took only four users before enough major problems were raised that we had to address. The problems involved confusion among the various structures (rows, columns, segments, blocks), which at that time were treated as different kinds of selection. During the next few months we redesigned and reimplemented the conceptual model and user interface: The structures were simplified to the composite structural model, and dragging and animation were added.
Another example of a problem raised in the user tests was that our early design for L-gestures was much too confusing to users. The original L-gestures are shown in Figure 10. The L-gestures were powerful operations for opening and closing spaces. Their design was perfectly logical: the order in which an L was drawn was significant; the first leg of an L indicated where the space was to be adjusted, and the second leg indicated the direction and extent of the adjustment. But users could not remember this abstract logic. Therefore, we abandoned these operations(fn13) and made all L-gestures be simple projective selection gestures (Figure 6). Users have no trouble with these.
Finally, a subset of the implicit structure scheme (most of the aspects described in this paper) was chosen to be "hardened" and incorporated into LiveWorks' MeetingBoard product.
Pen-based systems promise informal interaction. Yet they mostly use the pen to input characters and then treat the text in the standard way. That is to say, they are not freeform in the sense defined in this paper. Perhaps the most notorious pen-based system today is the Apple Newton MessagePad [8]. The Newton uses two basic structures, character strings and structured graphics. However, once handwriting or drawing is interpreted, the interpretation is permanent. Thus handwritten text cannot be treated in a structural manner, and strings of characters generally cannot be freeform. Also, characters and graphics cannot be manipulated together in a structural manner; e.g., a graphic object located within a character string will not be moved when characters prior to it are deleted.
The aha! InkWriter [1] is perhaps the closest system to ours in its basic objectives. Its main goal is to treat handwriting as text. It supports text paragraphs and lists of handwriting or characters. Graphics are treated as separate paragraphs. The system compromises on freeform interaction, because it uses a "lined paper" background. Strokes are interpreted as handwriting only if they occur between the lines, and strokes are interpreted as graphics only if they are more than two lines high (and at least one space from a handwritten paragraph). Once input is interpreted, it remains as either text or graphics. Neither system exhibits the fluidity or flexibility that we feel is necessary for a truly usable informal (i.e., freeform) system.
LIMITS OF THE IMPLICIT STRUCTURE APPROACH
We should put the notion of implicit structure within freeform
interaction in perspective. Freeform interaction is appropriate only
in situations where constraints would inhibit rather than support a
process. There are times when constraints are helpful. We would not
be against "freezing" (making explicit) an implicit
structure in such a situation. What seems needed is a way to
transition from freeform to structured interaction [12]. Treating structure implicitly and ephemerally
is useful in early, formative stages of a process.
But it should be understood that there are limits to the implicit structure approach.(fn14) Given the inherent freedom of expression in freeform interaction, it is difficult for users to stay within the confines defined by particular "visual grammars." Even if users mentally stick to particular structures, there are the manual problems of neatness, in which users vary considerably. These make system perception difficult in general. That is why we have chosen not to implement elaborate recognition grammars, but rather to "perceive" simple visual features (e.g., alignment) that are useful across different structures.
We could mitigate the perceptual problems in several ways. The most heavy-handed way is a structure editor (like an Emacs mode). This is not acceptable. We could provide guidelines (like "lined paper"). This would be acceptable if the user were not confined in rigid ways to them. We have taken the "softest" approach: the use of cleanup operations.
In any case, this approach requires a spirit of cooperation between the user and the system. The users have to follow good-faith "interactional maxims" (analogous to the conversational maxims [5] that people naturally follow) if implicit structure is to work. We suggest a Maxim of Appropriateness, which says that users will only invoke operations that are appropriate given the material at hand (e.g., they will not try to do a list move on a sketch of a face). With experience, the user becomes more attuned to what to expect of the system (i.e., the sense of "appropriateness" becomes highly refined), and the interaction becomes skillful. The user can then have the benefits of structural support as well as freedom of expression.
References
[1] aha! InkWriter Handbook (1993). Mountain
View, CA: aha! software corporation.
[2] Card, S. K., Moran, T. P., & Newell,
A. (1983). The Psychology of Human-Computer Interaction. Hillsdale, NJ:
Lawrence Erlbaum Associates.
[3] Chang, B. W., and Unger, D. (1993). Animation: from
cartoons to the user interface. Proceedings of UIST'93, 45-55. New York:
ACM.
[4] Elrod, S., Bruce, R., et al. (1992). LiveBoard: A large
interactive display supporting group meetings, presentations and remote
collaboration. Proceedings of CHI'92. New York: ACM.
[5] Grice, H. P. (1975). Logic and conversation. In P. Cole
& J. Morgan (Eds.), Syntax and semantics 3: speech acts. New York:
Academic Press.
[6] Lakin, F. (1987). Visual grammars for visual
languages. Proceedings of AAAI'87, 683-688.
[7] Moran, T. P. (1993). Deformalizing computer and
communication systems. Position Paper for the InterCHI'93 Research
Symposium.
[8] Newton MessagePad Handbook (1993). Cupertino, CA:
Apple Computer, Inc.
[9] Pedersen, E. R., McCall, K., Moran, T. P., & Halasz,
F. G. (1993). Tivoli: An electronic whiteboard for informal workgroup
meetings. Proceedings of InterCHI'93, 391-398. New York: ACM.
[10] Robertson, G. G., Card, S. K., MacKinlay,
J. D. (1989). The cognitive co-processor architecture for interactive user
interfaces. Proceedings of UIST'89. New York: ACM.
[11] Saund, E., & Moran, T. P. (1994). A
perceptually-supported sketch editor. Proceedings of UIST'94. New York:
ACM.
[12] Shipman, F. M., & Marshall, C. C. (1993). Formality
considered harmful: experiences, emerging themes, and directions. Technical
Report, Department of Computer Science, University of Colorado.
[13] Shipman, F. M., Marshall, C. C., & Moran,
T. P. (1995). Finding and using implicit structure in human-organized spatial
information layouts. Proceedings of CHI'95. New York: ACM.
[14] Stallman, R. (1985). GNU Emacs Manual. Cambridge,
MA: Free Software Foundation.