1. A brief introduction to phylogenetic reconstruction .
o Datatypes.
o MP.
o ML.
o NJ/ME.
o Bayesian inference.
o Heuristic vs. exact methods.
o How do we use phylogenies?.
We introduce the book with a summary of contemporary phylogenetic reconstruction methods. The summary will not be exhaustively detailed, nor will the different phylogenetic methods be covered in any great mathematical depth. The aim is to re-acquaint readers with standard phylogenetic methods. In addition, we will discuss the preparation of data for phylogenetic analyses.
2. Statistical Inference .
o Estimation.
o Hypothesis testing.
o Randomization methods.
The fundamental concepts of statistical inference – hypothesis testing, power, Type I and II errors, least-squares and likelihood-based inference – will be discussed. Since many statistical phylogenetic methods involve the use of randomization tests, we will cover the rationale of randomization methods, discuss resampling methods with and without replacement, and talk about Monte Carlo methods.
3. Is there any phylogenetic signal in my data? .
o saturation.
o Ts:Tv ratios.
o randomization tests (PTP, T-PTP).
o skewness statistics.
Until recently, most datasets used in phylogenetic analyses had enough phylogenetic signal to warrant their use. However, with more genomic information available, and deeper lineages appearing on the tree of life, many datasets have lost significant phylogenetic signal. Consequently, tests of phylogenetic signal that for a while were used only infrequently, are now important.
4. Are my data tree-like? .
o site pattern analysis.
o trees vs. splits-graphs.
If we are to build a phylogenetic tree, then we need to be certain that a tree is the best way to represent the evolutionary history of the organisms in question. How do we determine this?.
5. Should I combine data from different datasets? .
o Partition tests.
o Tests of recombination.
With the large amounts of genetic and genomic data now available, there is the opportunity to concatenate several genes, even whole genomes, for any phylogenetic analysis. But is this an appropriate approach – do all genes have the same evolutionary history? Has there been recombination, lateral transfer or gene conversion? How do we identify these, and how do we identify regions that evolve at different rates?.
6. What is the best model of evolution for my data? .
o LRT of models.
o AIC, BIC.
o Do I have a molecular clock?.
o LRT.
o Relative rates tests.
Models of evolution are at the heart of certain statistical phylogenetic procedures, including ML and Bayesian estimation of phylogenies, and any downstream tests of evolutionary hypotheses. Choice of the appropriate model is a critical first step for these analyses.
7. How good is my tree? .
o CI, RI, HI.
o Bootstrapping and Jackknifing.
o Bremer decay.
o Zero length branch tests.
o What about long-branch attraction?.
Once a tree is built, how confident can we be that it is a faithful representation of evolutionary history?.
8. How do I use my tree(s) to test different topological hypotheses? .
o KH test.
o SH test.
o SOWH test.
o AU test.
o RKB3 test.
o Bayesian tests.
o Phylogenetic clustering.
o Nested clade analysis.
Frequently, we may want to know whether alternative hypotheses of evolutionary relationships are equally supported by the data. How are we to test these alternatives against the most likely or most probable hypothesis that the data returns? How do we avoid the trap of multiple hypothesis testing, when there are many alternatives? How do we detect whether there is significant spatial patterns in our data?.
9. How do I test hypotheses about phenotypic evolution? .
o Reconstruction of ancestral states.
o Phylogenetic contrasts and phylogenetic regression.
Since the mid-1980s, phylogeneticists have realised that raw, uncorrected correlations of phenotypes (e.g., body size and brain size) can produce spurious results because of the underlying evolutionary relatedness of organisms. How do we correct our analyses to take account of this? How do we map the evolutionary trajectories of the characters we are interested in, and how do we know if one trajectory is significantly better than another?.
10. How do I detect selection? .
o codon models.
o lineage-specific selection.
o detecting selection on non-coding regions.
In the last 10 – 15 years major strides have been made with the development of methods to identify selection acting on molecular sequences, either in particular parts of sequences, or along particular phylogenetic lineages. These statistical techniques are arguably amongst the most interesting and most flexible, and hold the greatest promise for biological insight.
11. How can I use phylogenetic information in comparative genomics? .
o Database searching.
o functional assignment.
The relatively new field of phylogenomics focuses on the use of phylogenetic methods in comparative genomics studies. What can phylogenetic relatedness tell us about the likely function of homologs, and how do we test whether these assignments are statistically significant? Can we improve our database searches by incorporating phylogenetic information?.
12. Future developments .
o Supertree and supermatrix methods.
o DNA barcoding.
o Bayesian methods.
In this concluding chapter, we cover other topics that warrant mention, but for which the development of statistical methods remains in its infancy.