Reconstructing the Tree of Life
One key issue in reconstructing the Tree of Life is the development of algorithms and computational infrastructure to allow scientists around the world to apply the same methods.
A common approach is to identify the simplest hypothesis of relationships that explains as much different evidence as possible. Increasingly, however, scientists prefer the tree that renders the observed species data the most likely, given an underlying model of the evolutionary process.
But finding the simplest or the most likely hypothesis can be very challenging. As phylogenetic datasets grow larger, it becomes more difficult to analyze them properly. With more and more species under study, the number of alternative phylogenetic hypotheses that must be considered to select the best tree increases dramatically.
For example, for 3 species there are just 3 possible phylogenetic trees, and for 5 species there are 105. From there the number of possible trees grows amazingly quickly. For 50 species there are more possible trees than the number of atoms in the universe. For 100 species there are more trees than the volume of the entire universe measured in the smallest possible units, assuming expansion at the speed of light since the “big bang” 20 billion years ago.
No computer, no matter how powerful, can examine every possible tree for even a moderate number of species. Therefore, computer scientists have had to devise clever strategies to avoid examining every possible tree; so-called heuristic search algorithms.
One method quickly builds a starting tree and then rapidly swaps branches around to find better trees.
Another strategy breaks large problems down into smaller ones, solves these, and then puts them back together again.
Much remains to be done to improve the performance of phylogenetic methods.