TreeMap '97

Authors:
Jerome Brown
Shaun Gittens

Abstract

Effective display of hierarchical data sets is still a major challenge in the field of information visualization. Numerous distinct methods exist for displaying hierarchical data sets. One such technique is termed the Treemap, which displays a tre e using a 2-D space filling algorithm as an alternative to the traditional node/edge representation. In this paper, we discuss and demonstrate TreeMap 97', our Windows 95/NT application which implements treemaps given hierarchical datasets, and the featu res we incorporated in it.

Background

Hierarchical data is data on which a tree structure may be imposed. The directory of a computer drive is a good example of such a hierarchy. Other such hierarchies may include the makeup of large organizations and even the World Wide Web.

Problem

When the amount of data in such a hierarchy begins to get large, viewing the entire tree and retaining the information sought can be a tough task for the user. For one, a tree can grow in size very quickly as the depth of the tree increas e. As a result, it becomes impossible to view a tree in its entirety. Methods commonly used to remedy this problem include using scroll bars to see sections of the tree which extend outside the viewing window. The use of this method in viewing hierarch ical data, however, causes a bottleneck to the user's visual system, restricting the amount of information available on-screen to be far less than what the user is capable of processing visually. It also puts more stress on the user's ability to recall i nformation and its location in the tree. This can certainly prove bothersome for reasonably large trees.

Treemaps

The treemap, developed by Dr. Ben Shneiderman, displays a tree using a 2-D space filling algorithm as an alternative to the traditional node/edge representation. Once completed, an entire tree structure is viewable on the screen. Also, color and size of nodes can be manipulated to give user even more information on the hierarchical data set.

Data Source Implementation

TreeMap allows the user to enter any kind of hierarchical data set as long as the data file is in a specific delimited format. The format we used for the data set is the format used in the tree map applications at the Human-Computer Interaction Laboratory at the University of Maryland -College Park. The data is stored in simple text files. A semi-colon separates each item in the file. Semi-colons separate the different header, the attributes, and each of the nodes. The first item in the file is an integer, that is not used by TreeMap 97 The second item in the file is another integer. It tells the number of attributes that each data point has. Next, there are names of all of the different attributes. Attached to the front of the names of the attribute are integer values. The integer value indicates the type of data that the attribute name is representing. A zero in front of the attribute name means that the data this attribute name represents is an integer. A one tells the program that this attribute is a real number, and a two in front of the attribute, means that the value is a string. After the header information, the data values are stored. The first open bracket signifies the beginning of the data. An open bracket signifies the start of a node. Inside the bracket is all of the data values for that particular point. If the data is followed by another open bracket, then the current node is a parent with children. If an open bracket is closed without another open bracket in between, then the node is a leaf node. The leaf nodes are the only nodes with meaningful data. The intermediate nodes have dummy values for in their variables since there is no data associated with intermediate nodes. For Treemap , we used to data sets. The first set was a statistical data set of the National Basketball Association (NBA). The second data set was a statistical data set of New York Stock Exchange. The data sets will be mentioned in greater depth in the following paragraph.

Data Sets

The first data set is a file of the NBA statistics from the end of the 1992. The data set consists of every player in the NBA and their statistics for that season. The statistics included: total games played, scoring average, assist average, rebounding average, field goal percentage, etc. The data file has the most meaningful statistics about the NBA. The file has a total of 48 different attributes about the NBA. It has all of the twenty-eight teams as well as the all of the players.

The other data set used was made up of information from the New York Stock Exchange (NYSE) at the close of business on December 5, 1997. The file has all of the companies that were participating in the selling and buying of stocks. The attributes range from volume, value, and how and lows for the Dow Jones, to the change in percentage, index, and performance.

Problems Encountered

TreeMap has worked better than expected, but there were some pitfalls that hindered the development of the application. One problem encountered was the question of the color scheme. What attribute do we color on and which values do we display with colors are some of the issues that came up. The current version of TreeMap 97 offers the user two ways to color. In the first case, distinct attribute values are assigned its own color. Because there are more distinct values than we colors to select, we had to find a way to assign values to colors. We decided to assign the colors on a first come, first serve basis. The program will assign colors to the to the first 7 distant attribute values, and with any value thereafter going to the same color. Another way color is used to view data is by using one color and changing the intensity as the value of the attribute changes. This method of color is more effective than the previous method when dealing with attributes that have many different values. Another problem we encountered was trying to find out how and where we would put the values of a selected data item. We decided to put the data item. To alleviate this problem, we installed a group box on the right side of the display. The group displays the attributes for the selected item in a list box. The group also has the legend of colors, showing each color and value associated with it. The last set of items in the group is the slide bars. These brings us to another problem: How does the user select the attribute to color by or size on ? We solved this problem by installing the two above mentioned slide bars. This was a concern because objects like radio boxes, check boxes, and popup menus would be inadequate because TreeMap deals with data sets with number of attributes ranging from 5 to 50. Using 50 check or radio boxes would cause the user to scroll up and down the screen which may become confusing. The scroll bar allows the user to search through every attribute before selecting the one he or she wants. The two scroll bars allow the user to select size and color. A major problem was handling the strings from the file. In some cases, the value of an attribute would be a string instead of a number. TreeMap assumes that all of the values for attributes are numerical and reads the value into a data structure that accepts only numerical. To compensate for the strings, we converted the strings values to real values.

Using TreeMap '97

Here are some sample applications of TreeMap '97 at work. The NBA and Stock Portfolio Datasets were made available to us courtesy of the Human Computer Interaction Lab at the University of Maryland:

1991-1992 NBA Season Dataset



Here, color coding is done on Field Goals per Game while size is determined by Points. In the Chicago Bulls, Michael Jordan's wide, dark rectangle is matched by none and immediately demonstrates his dominance in the NBA this 91' - 92' season.



Here, color coding of increasing intensity is given to Field Goal Attempts per Game while size is determined by Three-Point Field Goals per game. Dominique Wilkins of the Atlanta Hawks seems to have the honor of wielding the best combination of the two a ttributes in question.

Stock Portfolio



Sizing is done by stock price high while coloring is done by stock price low. At a glance, though it seems Group 3 has the better combination of promising stocks based on these two indices (i.e. as a whole, the group seems bigger and darker than the othe rs), Nucor Corp. of Group 2 seems to have everyone else beat.

Achievements

TreeMap started off with the goal of trying represent tree maps using an application developed in Delphi. TreeMap dynamically assigns the values to the colors that represent them and it allows the user to select the attribute color is selected on. This lets the user see the values as they change. A depth feature was added to TreeMap to allow the user to limit the depth of the hierarchical data. For example, if the depth is set to two, the user will only see the two levels of the tree, a parent and its child. This is useful because it does not overwhelm the user with all of the data rectangles at once. TreeMap allows the user to traverse from the highest level of the tree down to any leaf node. Another achievement is the ability of TreeMap to view any kind of hierarchical data that is in the specified delimited format. TreeMap also uses size to view the difference in data values. The user can dynamically change the attribute that TreeMap is basing the size on. Another accomplishment was the ability to assign the values of attributes to the intensity of a color. This allows Tree to give different colors to each of the different values since altering the RGB value of a color only changes the color slightly. This give TreeMap '97 the whole spectrum of any color to map values. TreeMap default color is blue when users implement this feature.

Shortcomings

Although TreeMap achieved all of its initial goals, there are still some things that will improve the application. One thing, that needs to be improved is the text that is written in the rectangles. As the the amount of nodes increase, the size of the rectangle for each of the nodes will begin to get smaller. As the rectangles decrease in size, the text in the rectangle is not decreasing. This is causing the text in the rectangles to be truncated. The font of the text should change in proportion to the change in the rectangles size. Another shortcoming is the assignment of colors to the values of an attribute. In cases where the amount distinct values are over 10, TreeMap will assign the colors to the first 10 distinct values of the attribute. This is a poor method because it might not give a accurate representation of the data. For example, if colors are assigned to values that only appear once and other values which appear often are all assigned to the other color, it will cause most of the rectangles to take on the other color. This makes it impossible for the user to find out the values of the other attributes. A better solution would be to have colors assigned to the values that appear the most. This will prove helpful in trying to discover different patterns or trends. Another problem with TreeMap is its handling of string values for attributes. TreeMap assumes that the values of the attributes are numeric. TreeMap should be able to handle attribute values regardless of the data type.

Future Modifications

TreeMap currently has some very nice features, but it will be that much better with some changes to future versions. Future TreeMap versions will be able to change the size of the text in rectangles as the size of the rectangle changes. This will eliminate the problem of the text, located in the rectangle, becoming truncated when the size of the rectangle becomes smaller. Another modification will deal with the assigning of colors to values. Right now, TreeMap assigns colors to the first 10 distinct values. The new version will assign the values that appear the most to the colors. This will help users determine trends, popularity, or even patterns of an attribute. Future versions will also be able to read any type of data as the value for an attribute. This will allow attributes to be string values as well as numeric values. TreeMap will also store averages of all of the parents children in the parent. An example of its purpose can be seen using the basketball data. The leaf nodes are the players. The parents for the players would be the teams. If the average of all the players data was stored in the team, it will be the team averages. This is very useful if you were curious as to how the team is doing as a whole.

Conclusion

There is a variety of ways to represent hierarchical data. Each of which have strong points and weak points. Tree maps are one way of representing hierarchical data. Tree maps try to take advantage of the entire screen so that the most information can be received. TreeMap is one application in a long line of Tree mapping applications. TreeMap offers the user an allotment of features including coloring on intensity, controllable depth of tree representation, and grouping attributes based on size and color. There are some places where TreeMap can improve on. Problems like: truncation of text, coloring on first ten distinct values, and the handling of strings present problems for future programmers of TreeMap . TreeMap was a very successful start at to an interesting problem. With thoughtful modifications, creative additions, and determination, TreeMap has a chance to be as useful as some of the currently existing tree map applications.


Click here to download this preliminary version of Treemap 97'.

Citations

  1. Ben Shneiderman, Tree Visualization with Tree-maps: A 2-d Space-filling Approach. ACM Transaction on Graphics (11)1 (January 1992), 92-99.

  2. Brian Johnson and Ben Shneiderman, Tree-maps: A Space-Filling Approach to the Visualization of Hierarchical Information Structures. Proc. IEEE Visualization'91 (San Diego, California, October 1991), 284-291. Reprinted in Ben Shneiderman (Editor), Sparks of Innovation in Human-Computer Interaction, Ablex Publishers, Norwood, NJ, 1993, 309-322.

  3. Ben Shneiderman, Visual User Interfaces for Information Exploration. 1991 Proc. of American Society for Information Sciences, 379-384.

  4. Ben Shneiderman, Designing the User Interface - Strategies for the Effective Human-Computer Interaction, Third Edition. Addison Wesley, Reading, Massachusetts, 1998, Chapter 15.

  5. Brian Johnson, TreeViz: Treemap Visualization of Hierarchically Structured Information. Proc. ACM CHI'92 (Monterey, CA, May 1992), 369-370.

  6. David Turo and Brian Johnson, Improving the Visualization of Hierarchies with Treemaps: Design Issues and Experimentation. Proc. IEEE Visualization'92 (Boston, October 1992), 124-130.

  7. Brian Johnson, Treemaps: Visualizing Hierarchical and Categorical Data, Unpublished PhD. dissertation, Dept of Computer Science, University of Maryland, College Park, MD, 1993.

  8. David Turo, Enhancing Treemap Displays via Distortion and Animation: Algorithms and Experimental Evaluation, Unpublished Masters dissertation, Dept of Computer Science, University of Maryland, College Park, MD, 1993.

  9. Toshiyuki Asahi, David Turo, and Ben Shneiderman, Using Treemaps to Visualize the Analytic Hierarchy Process, Department of Computer Science Technical Report CS-TR-3293, College Park, MD, June 1994. Information Systems Research 6, 4 (December 1995), 357-375.

  10. Harhsa Kumar, Catherine Plaisant, Marko Teittinen, and Ben Shneiderman, Visual Information Management for Network Configuration, Department of Computer Science Technical Report CS-TR-3288, College Park, MD, June 1994.