The World Wide Web provides access to millions of digital images. Trying to find interesting images in such a collection is a dauntly task by itself, and if one were to find how or why an image was generated, the task would be extremely time consuming. While many projects have addressed either one of these tasks, none have attempt to combined them in a succinct way. By looking at a small subset of the images on the World Wide Web (i.e.,Library of Congress American Memory collections), the goal of this project is to lay the ground work in searching for images and its relevant data.
The HCIL at University of Maryland and the Library of Congress have developed a prototype for exploration of a digitized collection relating to American culture and history. It uses scroll bars and menus to allow users to perform dynamic queries on the Library of Congress collection. This project improves the prototype by refining the Browse and Select collections interface and adding image searching capabilities.
Our work has its roots in many areas such as data visualization, image querying, and wavelets. Data visualization will be key in coordinating the tasks of image searching and browsing. Image querying and wavelets support image comparison.
One of the leading image query systems that exists today is the the QBIC (Query By Image Content) project at IBM Almaden (Rabitti, 1992) which uses image content as the basis of the queries. Possible search criteria are color, texture, and shapes. In response to a query, a set of images is returned, and the user can then choose those images that satisfy his/her needs. Two types of queries are supported. The user may choose the color, shape, and texture or draw a sketch. The user may perform query by example, in which the user can choose an image returned in response to a query, and ask the system to return other similar images. Also the use of keyword search provide more accurate search results. For example by adding the word "cat" to a query, only images with cats will be returned.
QBIC addresses two problems inherent in image searching. The first problem is that the metric for feature vectors is non-Euclidean. QBIC solved the problem by proving that the non-Euclidean distance can be bounded by a Euclidean distance. Secondly, the vectors representing the images have high dimensionality. This problem was solved by using the Karhunen Loeve transform to reduce the dimensionality of the vectors. One drawback from these solutions is that they may allow false hits, but they would not lead to false dismissals. Also the interface in which images are displayed does not reveal what other images are in the database other than the ones being currently displayed.
In the paper by (Gupta 1997), they describe a system that also uses image content as the basis for queries. A user can view the entire contents of a image database with the interface developed at UCSD. The interface for this system was designed to support an explore-navigate-select-refine cycle of task in order for users to identify objects of interest. The data is displayed by a three dimensional scatterplot. Users can either start by selecting features to place along an axis, or they can begin with a search image. Among the features are color, color distribution, and structure. This enables the user to get a basic look at the database contents. The system has operations to perform cluster selections. These can be used to pose queries to return images similar to the ones in the the cluster.
An interesting problem addressed in this sutem is that the do not automatically reduce the all the features into a few that can be displayed in two or three dimensions. In other words they do not project a higher dimension feature space down to a lower dimension where information can be potentially lost. In allowing the user to select the dimensions of interest the user can gain an understanding about the spatial relationship in a scatterplot when they are exploring the database.
Most image querying systems rely on a metric to compare images. The intent is to have an image metric that is fast to compute and produces good results even with distorted query images. The work by Salesin, he introduces a multiresolution image querying method that is designed to address this problem. The method is based on wavelets. Wavelets are coefficients achieved by applying a function repeatedly to a matrix that creates signatures for each matrix. Salesin defines an algorithm that uses truncated, quantized versions of the wavelet decompositions (i.e, signatures). These signatures are based on the most significant coefficients computed using a two-dimensional Haar wavelet decomposition. The Haar wavelet decomposition basically averages the rows and columns of each image for sqrt(n) times. This eventually produces the average color of the image, as well as other defining signatures. The signatures are quantized such that +1 is used to represent a large positive value, and -1 is used for large negative values. The authors found that this sped up the search process and reduced storage requirements. This is a multiresolution approach, meaning that queries do not have to have the same resolution as potential targets. This method can be used to search through images that have been compressed using wavelets. This is very important since most databases do not have enough space to store full images. Thus indexes are created and images are kept outside of main memory and disk space.
Salesin uses Haar wavelets in his analysis (Salesin 1995). The main attraction to Haar wavelets is their simplicity. Essentially, Haar wavelets are easy to compute and produce an orthogonal basis (Ueda 1995). The problem with Haar wavelet analysis is that the ease of computation produces more error in image reconstruction due to the blocking effect, where images look like a collection of blocks and lacks the smooth edges that were originally present. Fortunately, in image querying, the user never sees the reconstructed image; the user is only shown actual images that closely match the query image.
Salesin chose standard over non-standard decomposition. Standard decomposition applies wavelet transform to each row then columns while non-standard decomposition applies wavelet transform to rows and columns alternating between them during the process. He reasoned that since non-standard analysis produces square basis functions, non-standard analysis would work better on images with square-shaped objects. Standard analysis, on the other hand, would work best for images with emphasis on lines and other rectangular features. Through experimentation, Salesin found that standard analysis produced better results. Another advantage of standard decomposition is that two-dimensional analysis is composed of two one-dimensional analysis, first horizontally, and then vertically (Stollnitz 1995).
In wavelet image retrieval systems, instead of storing the actual images in the database, each image is decomposed, and the n largest (in magnitude) coefficients for each image are saved. When the user poses a query, the query image is also decomposed, creating a signature of n coefficients. This signature is compared to those in the image database using an image query metric. This metric is faster to compute than the L^1 or L^2 metrics. The query metric assigns variable weights to the coefficients that can be tuned according to the types of images involved in the queries. These weights can be determined using statistical techniques. Salesin simplifies his metric by only considering terms in the query signature that are non-zero. Therefore, queries without a lot of detail can be matched to more detailed images. The reverse is not true - a more detailed query image will not always match a less detailed target image, but this did not pose a problem because their query images will always have at most the same detail as the target images.
Most search systems are static in nature (i.e, what you asked for is what you get). However, the user may want to change his/her query after seeing the system's results. TERPSI makes queries more interactive, by allowing users to add more images to a search key after an initial batch of images have been returned(Jones, 1997). The system retrieves images that it believes are most like the query image, which may not satisfy the user. For this reason, relevance feedback was implemented to improve the quality of the results. TERPSI implemented a relevance feedback system that helped the users fine tune their query specifications. For users who are artistically capable, TERPSI allowed them to draw an image which will act as the query image.
In TERPSI speed, accuracy, and minimizing space requirements were the primary goals. They predicted that much of the speed improvements would come from implementing the simple Haar wavelet decomposition method, and using a small number of wavelet coefficients in the images' signatures would reduce the amount of space required for each image in test database. Relevance feedback allows for eventual increased accuracy, but receiving more accurate initial results must begin with the user's initial query image. Thus, the user interface provided users the opportunity to point out which images they liked and which they didn't. TERPSI used this information for its relevance feedback mechanism.
In TERPSI, the database of images is populated by the Input Component (i.e, webcrawler). The test database was populated, with images from the Caprina Database, located on the inforM server at the University of Maryland. Caprina contains several thousand high-quality images of varied subjects. Once the Input Component finds an image from Caprina, it sends the URL and thumbnail of the image to the Server Component for preprocessing. To implement the preprocessing stage for wavelet decomposition, when the Server Component receives a request to insert an image into the database, the image color histogram is stored, and in a separate database the image is decomposed into wavelet coefficients. This enable us to compare a color histogram query metric with wavelet decomposition.
The vectorizer class in TERPSI was designed to handle different kinds of multimedia objects (images, sounds, videos, etc.), but require that a feature vector be created and that a Euclidean distance be used to determine relevance between two objects. To make it extendible to different search techniques, TERPSI assumes this could be accomplished by supporting a feature vector for each search technique. If all new techniques use a simple feature vector, this technique is appropriate. But not all new searching techniques depend on a simple feature vector. For example, wavelets depend on coefficients created during decomposition. Wavelet decomposition requires more complicated analysis than vector distance. Thus a new vector class called waveletizer class was created to implement the preprocessing technique described in (Salesin 1995).
When a query is posed, the system first decomposes the given image into the coefficient signature. This wavelet vector contains the eighty largest (in magnitude) wavelet coefficients, including the image's average RGB values, and their corresponding indices. The average RGB values are used to search an R-tree. The user specifies that n images are to be returned in response to the query. Initially, TERPSI searches for the 5n nearest neighbors of the query image. These 5n images are then scored using the query metric. The top n images are returned to the user.
In Salesin's work, the GUI allows users to paint an image on a canvas which then becomes formatted for querying against the database. They attached a button to the Client Interface that, when selected, will spawn a paint application, such as XPaint. In this program a user can draw his desired picture, save it, and then use this image as the query.
An important issue in any database is indexing, which can be utilized to optimize queries. In TERPSI low dimensionality vectors that characterize objects are supported. This is important to get fast performance from the indexer when objects are stored and retrieved.
In TERPSI, the average color of an image was the defining characteristic used for indexing. The indexer within TERPSI is a three-dimensional R-tree that uses the average three component color space. The three-dimension RGB color average is added to the 64-dimension color histogram to produce a 67-dimension vector that is stored in the database. Average color was also added to the eighty largest magnitude wavelet coefficient indices for each color channel.
The largest magnitude coefficient can be negative or positive. Therefore, to store these in a simple vector, the coefficients have to be further processed. After finding the largest coefficient indices, the positive coefficients were placed in the first part of the vector and the negative coefficient indices were place in the last part of the vector. To determine where the positive coefficient indices stop and the negative coefficient indices begin, are stored. Thus, the vector had 246 dimensions. IN TERPSI, every image that was gathered from the WWW was scaled to a 128x128 image representation. This was done to compare their results to (Salesin 1995).
Like other image querying systems, TERPSI does not reveal anything about the contents of a image database other than the ones that are currently displayed. Another invariant in TERPSI is that since the wavelet values are quantize to 1 or -1. Therefore weights had to be computed to compare images. Determining these weights can take a long time. Also the user must know what type of images are in the database before hand to get relevant search results. The results are more accurate than just using a color histogram to compare images because wavelets rely on the self similarity of objects within objects. In Wavelet decomposition like Fourier transforms of images, similar frequencies correspond to similar shapes. We try to exploit this in our scatter plot interface to reveal the contents of an image database.
Scatterplots are a mechanism to display data items on the screen on an individual basis. The position of each item on the screen is determined by the attributes currently used to draw a scatterplot and a mapping algorithm from the attribute values to screen positions.
With a scatterplot, an overview of the data is presented to the user. If the user understands how the spatial relationship of the data items relates to their semantic relationship, then the user will understand the distribution of the objects along a certain attribute (axis). A scatterplot can also show the correlation between one attribute and another as present in the data. For certain situations, a scatterplot can show the user whether some data items are "close to" or "far away from" other data items.
A scatterplot is good for presenting 2-D, 3-D, and multi-D data, especially when the distance of the attribute values of the data items corresponds to the semantic differences of the data items along that attribute.
There are different strategies to draw scatterplots. In Spotfire, the position of a data item is always determined by two attributes currently designated for the x- and y- axes. In XGobi, the scatterplot can be drawn along more attributes and then mapped to the 2-D screen. In XmdvTool, the scatterplot is an N X N grid of parallel projections of the data for N-dimensional data. Each grid is a simple xy plot for the two dimensions it represents.
Scatterplots become much more powerful when integrated with dynamic query mechanisms (Ahlberg 1994). Dynamic queries (Ahlberg 1994, Shneiderman 1994, Williamson 1992) form a novel approach for visual query formulation on a collection of data, each of which is characterized by a set of attributes. Queries are stated with visual query components and the effects of the actions taken by the user are immediately reflected on the display. Query results are presented visually and continuously at every step of the query formulation process to give guidance to the user. All of the actions of the user are reversible.
Dynamic queries have been applied in the visualization of multidimensional data (as in the Homefinder (Williamson 1992}, the Filmfinder (Ahlberg 1992), and the tool Spotfire), hierarchical data (Kumar 1994), and network data (Qiu 1995).
This project will combine data visualization and image querying techniques to allow users to browse, search, and filter a collection of images. The image search service will allow search results to be tuned by a users. In addition to returning the n closest images to the search key image a scatter plot view into the index of the image database will be displayed. The n closest images will be highlighted in this view. A click on a dot on the scatter plot will display the highlighted image. With each image there will be links to its collection. Adding filters for image attributes will make the interface capable of performing dynamic querying. This will allow users to browse collections that have similar images which may have different topics associated with the collections and to find relationships between collections not seen before.
The user interacts with the Java applet(see figure1). The applet initially displays a point in the scatter-plot for each image in the database. The applet can send a query request to the server when the query button on the applet is selected by the user. A datagram message is sent to the server. This server can be different machine as the machine that started the java applet. This message contains the url of the image search key and a number n, where n is the number of image urls the applet will receive. The sever returns the n closet matches to the image search key. These images are displayed in the java applet and the corresponding dots in the scatter-plot are highlighted.
Once the server receives a message, the message is decomposed and the URL and the number n are found. The server retrieves the URL and decomposes the image using the preprocessing step discussed in the Database section. Using the B+ tree the 5n candidates are found. The querying metric described in the Database section is applied to these 5n database records and the URLs of the n best are returned to the applet.
Figure 2 illustrates our user interface for this project. The scatter-plot plots a wavelet value against some bibliographic information about the image. The default will be year. Images that when decomposed with Haar wavelet technique have low wavelet indexes numbers among the first three largest wavelet values will be those that have simple figures in them like portraits. While those with high indexes among the first three largest wavelet values will be those images with complicated scenery(see figure 3). So if a person was interested in portraits of people they need only focus on points with low wavelets values. In the lower right hand corner a image will be displayed of the point in which a user clicks on. This image will also be used as a search key to find similar images. The results of these searches will be displayed in the boxes starting on the left hand corner. By placing a cursor on one of these images the appropriate point will be highlighted in the scatter-plot.
The Java interface for this project was developed based on the code for the Baltimore Learning Center (BLC) project (Rose 1997). The BLC user interface reads a data file, stores the data in a tree structure, draws a scatterplot of the data, provides a search facility, and supports user interactions such as changing the attributes for the x- and y- axes. In the data file, the first line specifies the labels for data attributes; the second line specifies the data type for each attribute; the subsequent lines contain the data items. In drawing the scatterplot, when multiple data items need to be drawn at the same position on the screen, the program will draw the dots along a spiral line. When the number of data items at the same position gets too big, the program will draw a bar instead of the dots. When the user moves the mouse close to some data points, a rectangular box is drawn around the points, meaning that they are currently selected. If the user clicks the mouse button at this time, then a preview window will pop up with the list of objects selected and facilities for the user to preview them.
In reusing the BLC code, we felt that the code was well written, with fair amount of useful comments and a clear structure. We were able to use it quite easily to read our data file and store the data in a tree structure. We also used the code for drawing the dots (especially for handling multiple data items at the same screen location) and the box around the dots (when the user moves the mouse cursor close to some dots) with almost no change.
The changes we made and new code we wrote include:
- Change the layout of the interface according to our design.
- The preview for the BLC project was done in a pop-up window. In our project we designated part of the main window for the preview purpose.
- The code of the BLC project only works for data attribute of nominal or ordinal scales in terms of drawing the scatterplot. We modified the code to make it work for data values of the ratio scale. This brought about quite some work both in understanding the original code and making the necessary changes.
- The code for displaying and navigating the preview images was new. The Java runtime support does not guarantee the loading of a full image by a simple function call. This problem was solved by using the Media-Tracker facility. The solution was borrowed from the original BLC code.
- The Java AWT does not have the kind of widget used in Spotfire for changing the value range on each axis. The BLC code does not support range query. Given the limited amount of time, we did not implement the function that allows the user to manipulation the value ranges on the x- or y- axis to filter the data.
- When running the Java applet from Netscape, the applet could not load anything from an outside web site. Since the images are on the Library of Congress web site, if we run the applet from University of Maryland web server, we cannot load the images.
- In the new code, currently selected images are highlighted in the starfield display, and the images in the search results are also highlighted in the starfield display.
- Modifications were made to the BLC code to allow communication with a deamon that gather messages from the server, this envolved significant amount of time to work out these details.
The database used in this project is a modification of the database used in TERPSI. In Terpsi the database uses the RGB color space. Since most of the Library of Congress images are black and white, we were able to exploit this and perform more accurate image querying. Just using gray scale color space we stored more information about one color channel per database image. Because we are using simpler database images, we made changes to the database record.
The TERPSI database record used as a key the average RGB color. In our project, gray scale average color is stored but is not the key. The key used is the average of the first three largest wavelet indexes. The index used in the TERPSI project was a three dimensional R-tree, that used the RGB as is three dimensions. We used the average of the first three largest wavelet indexes as a key into a B+ tree. The TERPSI record stored the 60 largest wavelet indexes and assumed that the values were quantized to 1 or -1. In addition to storing the wavelet indexes we stored the actual values of the wavelet indexes.
Historical data was retrieved from the Library of Congress' Browse and Select Collections prototype site. There were six sets of collections from which we extracted images. We added information about several other collections for the purpose of demonstration.
For each image we extracted attributes such as the title, image URL, collection URL, and the range of years that particular event took place. We even collected the main topics that are associated with each image, the formats that the information is available in (i.e. sound, manuscript), and the places that are associated with the event or person (s) being displayed in the image.
The data we collected had to be stored in a couple of different ways to satisfy the format requirements of the scatterplot and the java code. For this reason, the data collection process was very tedious and time consuming. However, it was very important that we store as much information about an image that would allow us to catalog each one uniquely.
There was one major disappointment to gathering lots of the information about the collections. The bar chart from the previous prototype would only allow approximately 20 records. After about 23-24 records, the x-axis (years) would not display and the scroll bars in the upper panel became disabled. Therefore the majority of our data did not get used for a bar chart display.
Our team wanted to add color the the American Memory prototype for the purpose of possibly offering some further information to the user from the display. The lessons learned from this effort helped us with adding color to our scatterplot display. The team that worked on the prototype before us also considered color-coding schemes for the intervals, but they rejected the idea. They felt that since most of the attributes for the collections were multi-valued, there would not be an appropriate single color code. For example, several of the collections that we used contain manuscript, sound, etc. There is no single color code that we could adopt that would be appropriate.
The American Memory prototype is written in the Java programming language. Therefore if any changes are to be made to the interface, they have to be implemented in Java. We attempted to add color to the bars on the bar graph with respect to topics. In the case where an collection had more than one topic associated with it, we wanted to add color vertically using a different color to correspond to a given topic. For example a bar on the graph would be one color if there was only one topic in its record, while a record with four topics would show as a bar with four colors equally proportioned arranged from left to right.
In the original prototype they also discussed the possibility of adding size to the collection for the y-axis. They explored the options of size by number of items, byte size, and time to evaluate. They disregarded all of these ideas and instead decided on iterating the intervals in a non-overlapping manner down the vertical axis. Thus, the y-axis has no meaning.
Another idea that our team discussed to further enhance our project was the addition of size to the bars in the bar graph display. We discussed the possibility of making the height of the bars proportional to the size of the collection with larger collections having thicker/taller bars. After further studying and experimenting we decided against the variable sizes of the bars. Although we knew it to be false, it appeared as if a longer (along the x-axis) bar with a thickness of four was larger (thicker/taller) than a shorter bar with a thickness of four. While we thought to add size to help our users in their decision-making, we concluded that size would only confuse and possibly add more time to the user's task.
The interface to TERPSI has a couple of drawbacks (See figure 4). The main drawback is that is no mechanism to browse the contents of a database. In order to begin a search a users must know the contents of the database either by typing in a URL of one of the images or draw an image throught XPaint. Futhermore when search results are returned, the user can get no sense of how the their results relate to other images in the database. When all the parts of our system were connected and working in a reasonable fashion, we were able overcome these drawbacks and allowed the user to explore the contents of test database.
When the scatterplot is first displayed, it reveals that there are two clusters in the database. One consist of low wavelet indexes among the first largest wavelet values and the other cluster that has higher wavelet values among the first largest wavelet values. By clicking on a few of the dots in the first cluster you can see that most of these images are portraits of people (See Figure 5). While the images in the second cluster are panoramic images of landscapes (See figure 6).
There were some interesting anomilies in our database. There was an image of a group of men standing out side (See figure 7). This image was far away from other images and after careful examination of the database there were no other images of people standing outside in a similar fashion. Another image depicted a man standing outside a building with had a lot of lines defining the bricks used to construct the building (See figure 8). This was the only image in which there was a building with that type of brick pattern. So like other scatterplot application our project was capable of pinpointing some anomalies in the test database.
One flaw present in our system is that since we are projecting the whole wavelet dimension space of 16000 values down to one which is the average of the largest three. Therefore there will be some error when displaying the images. There were two images that had a man standing next to a horse, which was in a sillohuette stance (See figure 9). These two images should have been display near each other in the scatterplot only after preforming a search did we find where both of these images were located in the scatterplot.
The initial prototype provides an overview of the collections on a timeline, selected filters, and a textual list of the collections. All three displays are tightly coupled with one another.
We did not change any major functionality of the previous prototype. We do not intend for our project to replace the working prototype, but rather to enhance it. Two features of the prototype, namely Browse and Select Collections and Search, rely on fake data for its presentation. We were able to replace this data with "real" information available on the Library of Congress web site.The prototype uses only a few records. To get a real feel for its performance a larger set of documents is needed. Also, the search service relies on keyword search when searching for images. There is no way of using an images as a search key and finding similar images to the search key. Searching with images has not been perfected and user involvement is necessary to get good results. We have been able to offer our solution to the "search by image" problem.
The interface implemented in this project brings together the techniques involving metadata search and image retrieval. It enables the users to locate meta information about an images and draw conclusions about a group of images as well. While other systems have just look at image similarity among a database of images, we have been successful in finding similar images and linking in relevant information about images. In the future it will be important that image database be capable of retrieving similar images but also provide the users with other relevant information about an image.
Rabitti, Fausto and Savino, Pascuale,(Aug. 1992) An information retrieval approach for image databases. VLDB Conference Proceedings, Vancouver, BC, Canada, pp.574-584.
Ahlberg, C. and Shneiderman, B., Visual Information Seeking: Tight Coupling of Dynamic Query Filters with Starfield Displays, Proceedings of the ACM CHI'94 Conference, 1994, pp. 313-317.
Kumar, H., Plaisant, C., and Shneiderman, B., Browsing hierarchical data with multi-level dynamic queries and pruning, International Journal of Human-Computer Studies, 1997, pp. 103-124.
Qiu, C., Shneiderman, B., Graph-based dynamic query interfaces and a prototype implementation.
Shneiderman, B. (July 1996) The eyes have it: A task by data type taxonomy for information visualizations Proc. 1996 IEEE, Visual Languages (Boulder, CO, Sept.3-6,1996) 336-343.
Shneiderman, B. (Jan. 1993) Dynamic Queries: for visual information seeking, IEEE Software, vol. 11, #6 (Nov. 1994) 70-77.
Williamson, C. and Shneiderman, B., The Dynamic Home Finder: Evaluating Dynamic Queries in a Real-Estate Information Exploration System, Proceedings of ACM SIGIR '92 Conference, 1992, pp. 338-346.
Salesin, David H., Jacobs, Charles E., and Finkelstein, Adam (1995), Fast Multiresolution Image Querying, Proceedings of SIGGRAPH 95.
Ueda, Masami and Lodha, Suresh (Jan. 1995) Wavelets: An Elementary Introduction and Examples,UCSC-CRL 94-47.
Stollnitz, Eric J., DeRose, Tony D., and Salesin, David H., (May 1995) Wavelets for Computer Graphics: A Primer, IEEE Computer Graphics and Applications, 15(3), pp76-84.
Gupta, Amarnath, Santini, Simone, and Jain, Ramesh,(dec 1997). In Search of Information in Visual Media, CACM, 40(12), pp.35-42.
Jones, Ryan, Muise, Heather, Berk, Barry, and Ding, Jian, Applying Wavelet Decomposition and Relevance Feedback to Image Querying, Final report for CMSC 724 class.
Marchionini, Gary, Plaisant, Catherine, Bruns, Tom, Cronnell, Teresa, Komldi, Anita, Nation, David, Shirinian, Ara, Shneiderman, Ben, http://www.cs.umd.edu/projects/hcil/Research/1995/ndldemo/draft11/notesbrowse.html