Taro Kuriyama & Pedro Moura  |  CS171: Visualization - Final Project  |   May 2009

The Tree of Life (According to Google)


Project Overview

In recent years, there have been a variety of projects that attempt to document and visualize the tree of life. The University of Arizona, for example, hosts the Tree of Life Web Project (TOL), which attempts not only to document the tree of life but also to centralize the large amount of historical and genetic information available on the internet. Researchers at the University of Texas have also created a nice visualization of an arbitrarily representative sampling from the tree of life (visit the page here).

From a human perspective, some species in the tree of life are more important than others. The importance can be defined both historically (the line from which we derive) and scientifically (the species that we study the most). Our project visualizes the latter, drawing from Google as a representative sample of widely available human knowledge--and thus the number of Google search results as indicators of the relative importances in the tree of life. We thus attempt to answer (or enable one to answer):
Data Aquisition

Given the immensity of the tree of life as documented on the TOL site (75,000+ nodes), we decided to focus on a small part of the tree, namely Eubacteria and its descendents. The tree is organized cladistically and does not contain the end nodes of species, so we ended up with a manageable 750+ nodes. To scrape the nodes and query google, we wrote a Python script using the Beautiful Soup library. We also downloaded the XML file for Eubacteria from the TOL site's web services and parsed it using the xml.sax package in Python. (In retrospect, it would have been easier to work directly with the XML file from the outset).


Visualization Approach

Our original conception was to visualize each interactive "level" of the Eubacteria branch using a Voronoi treemap. However, the implementation of Lloyd's algorithm for weighting proved a bit beyond the scope of this project. So instead, we wrote a Python script to prepare a squarified treemap based on the tiling algorithm described here. Among all rectilinear treemaps, the squarified treemap has the advantage of preserving aspect ratios as close to one as possible, thereby allowing easier comparison of areas. The squarified treemap is also conveniently ordered (in our case, by the number of Google hits in descending order).

The final rendering was written in Java using Processing. Users may switch between visualizations of Google and Google Scholar results; they may also click each node to display information and double click to zoom in on its children. Because some maps had many nodes--and consequently tiles that were very small--we implemented a linked list that also allows users to navigate the tree using text.

>> Click here to see the Java applet in action. (2017 update: Java applet no longer runs in Chrome)




Notes



taro.kuriyama (at) gmail.com | pmavfmoura (at) gmail.com