SplitsTree: A Program for Analyzing and Visualizing Evolutionary Data ===================================================================== Overview ======== Based on the split decomposition method due to H.-J. Bandelt and A.W.M. Dress, it takes as input a distance matrix or a set of aligned sequences and produces as output a graph that represents the evolutionary relationships between the taxa. For ideal data, this graph is a tree, whereas less ideal data gives rise to a more or less tree-like network that can be interpreted as possible evidence for different and conflicting phylogenies. The program supports a number of distances transformations, the computation of parsimony splits, spectral analysis and bootstrapping. After a review of the concepts of splits, split graphs and the method of split decomposition, this file describes the SplitsTree program in detail. Introduction ============ Evolutionary relationships between taxa are most often represented as phylogenetic trees, and many different algorithms for tree con- struction have been developed (Swofford et al. 1996). This is, of course, justified by the assumption that evolution is a branching or tree-like process. However, a set of real data often contains a num- ber of different and sometimes conflicting signals and thus does not always clearly support an unique tree. To address this problem, H.-J. Bandelt and A.W.M. Dress (1992) de- veloped the method of split decomposition. In contrast to methods such as maximum parsimony and maximum likelihood that recon- struct phylogenetic trees by optimizing parameters, split decom- position is a transformation-based approach. Essentially, evolution- ary data is transformed or, more precisely, "canonically decom- posed", into a sum of "weakly compatible splits" and then repre- sented by a so-called splits graph. For ideal data, this is a tree, whereas less ideal data will give rise to a tree-like network that can be interpreted as possible evidence for different and conflicting phylogenies. Further, as split decomposition does not attempt to force data onto a tree, it can provide a good indication of how tree- like given data is. There exist efficient algorithms for performing split decomposition (Bandelt and Dress 1992) and for computing splits graphs (Wetzel 1995 and D.H. Huson, in preparation). A.W.M. Dress and R. Wetzel produced a simple implementation of split decomposition (Wetzel 1995) as a investigative tool to help develop the general theory. Based on their work, a first public version was developed by R. Wetzel and the author (SplitsTree version 1). The program de- scribed in this paper (SplitsTree version 2) is a completely new implementation that was recently produced by the author. In this file we first review the concepts of splits, splits graphs and the method of split decomposition, and then discuss the SplitsTree program in detail. For a number of biological applica- tions of the split decomposition method, see e.g.: Bandelt and Dress 1992b, Dopazo et al. 1993, Dress and Wetzel 1993, Lockhart et al, 1995, Wetzel 1995, Dress et al. 1996, McLenachan et al. 1996, or P.J. Lockhart et al., in preparation. Splits and Splits Graphs ======================== Evolutionary relationships are generally represented by a phyloge- netic tree, T, that is, a tree whose leaves are labeled by a set X of taxa and whose remaining vertices are unlabeled and of degree at least 3. (We only consider unrooted trees in this paper.) Any edge e of T defines a split S={A,A'} of X, i.e. a partitioning of X into two non-empty sets A and A', consisting of all taxa on the one side, or the other, of e in T. Such a system S of splits is called compatible, if for any two splits S1={A1,A'1} and S2={A2,A'2} in S, one of the four intersections A1 n A2, A1 n A'2, A'1 n A2, or A'1n A'2 is empty. Any phylogenetic tree T gives rise to a compatible split system S . In 1971, P. Buneman established that, vice versa, any compatible split system S corresponds to a unique phylogenetic tree T (Buneman 1971). So, tree reconstruction for a given set of taxa X is equivalent to computing a compatible system of splits S for X and determining a weight for each split S that corresponds to the length of the associated edge. Hence, to obtain more general graphs, one must consider less re- stricted systems of splits. Let X be a set of taxa. A system of splits S of X is called weakly compatible, if for any three splits S1, S2, S3 and all Ai in Si (i=1,2,3), at least one of the four intersections A1 n A2 n A3, A1 n A'2 n A'3, A'1 n A2 n A'3, or A'1 n A'2 n A3 is empty (Bandelt and Dress 1992). So, in particular, any two splits are permitted to be incompatible. Intermediately, S is called circu- lar, if there exists an ordering x1,x2,...,xm of the taxa such that for every split SÎS there exists AÎS with A={xp,xp+1,...,xq} and1£p£q£m. One can prove that a circular split system is always weakly compat- ible and a compatible split system is always circular. A splits graph representing a weakly compatible split system S is a graph G(S)=(V,E) whose vertices vÎV are labeled by the set of taxa X and whose edges eÎE are straight line segments that represent the splits in S, see Figure 1. More precisely, each split S={A,A'}ÎS is represented by a band of parallel edges of equal length in such a way that deleting all edges in such a band partitions the graph into precisely two components, one containing all vertices labeled by taxa in A and the other containing all vertices labeled by taxa in A'. The length of the edges representing a given split S indicates its weight or support and is calculated as the isolation index of S. For algorithms that compute splits graphs, see (Wetzel 1995 and D.H. Huson, in preparation). Consider a weakly compatible system of splits S of a set X of taxa. If S is compatible, then G(S) is a phylogenetic tree. What is the situa- tion if S is merely circular? Then G(S) can be realized as a planar graph (Wetzel 1995 and D.H. Huson, in preparation). Finally, if S is not circular, then in general G(S) will not be planar. In biological applications, the arising split systems are often either circular or mildly non-circular. Split Decomposition =================== Split decomposition is a method for obtaining a system of weakly compatible splits with weights from a given set of evolutionary data. So, assume we are given a set of taxa X and a distance map d:X*X -> R>=0 on X, i.e. a matrix representing the evolutionary dis- tances between pairs of taxa. H.-J. Bandelt and A.W.M. Dress (1992) showed that such a distance map d has the following unique de- composition: d=S aS×dS + d0. Here, we sum over all possible splits S of X; the map dS:X*X -> R>=0 is the split metric on S that equals 1, if x and y lie on different sides of S, and 0, otherwise; the number aS ³ 0 is the weight or isolation- index of the split S; and the map d0:X*X -> R>=0 is the so-called split prime residue and cannot be decomposed further. A split S with aS >0 is called a d-split, and the system S of all d-splits is weakly com- patible and can be computed efficiently, see (Bandelt and Dress 1992) for details. If there is no split prime residue, then the distance between any two taxa x and y is precisely equal to the sum of weights of all d- splits that separate x and y, and thus proportional to the sum of all edge lengths along a shortest path from x to y in the splits graph. However, in general, the split prime residue will be positive and so the sum of weights will only give an approximation (from below) of the original distances. The fit of the approximation is measured by the sum of all approximated distances divided by the sum of all original distances. In biological applications, the fit is often quite high and a small split prime residue can be considered as "noise". If we are given a set of aligned sequences, then to apply split de- composition we must first compute a distance matrix d using an ap- propriate distance transformation. Alternatively, one can compute the so-called parsimony splits, or p-splits, directly from the se- quences, as described in (Bandelt and Dress 1993). Yet another possibility is to use spectral analysis (Hendy and Penny 1993, Szekely et al. 1993 and M.D. Hendy and P.J. Waddell, in preparation) to assign a weight (the so-called g-value) to each possible split of X. One can then greedily extract a weakly compatible (or compatible) system of splits, i.e. by considering all such splits S in decreasing order of weight and in- serting the split S into S if it is weakly compatible (or compatible) with all splits already in S. Description of SplitsTree ========================= SplitsTree is a command line driven program for analyzing and visualizing evolutionary data. It takes as input a file containing sequences, distances, or a system of splits and produces as output a weakly compatible system of splits and a splits graph representing the given data. It contains a number of transformations to obtain distances from sequences and methods for obtaining compatible or weakly compatible split systems from distances or sequences. In most cases, the user will open a file using the open file; command and then use the assume command to specify which caculations are to be peformed. Then, the draw command will be typed to see a picture of the computed splits graph, the show command will be used to inspect the different blocks of data computed and then perhaps the save filename; command will be used to save some of the data. List of Commands ================ The program offers the following commands: open filename; save [blocks] filename; show [{blocks|keywords}]; assume assumption; draw [geometry commands]; bootstrap number-of-runs; ps [filename]; shell {command|"commands"}; help [name]; title name; info; version, thanks, close; quit; import REAP filename; open filename; - Read input from a file in NEXUS format save [blocks] filename; - Save blocks to a file. If no blocks listed, saves all valid blocks show [blocks]; - Show valid blocks on screen show {TREE | TREETOP}; - Show tree (or tree topology) in Newick's 8:45 standard format assume assumption; - Enter an assumption, i.e. any legal ST_ASSUMPTIONS-block command. This is the main command used to control which computations are performed draw [draw-commands]; - Draw the current splitsgraph using ghostview. If the environment variable SPLITSVIEWER is set to a PS viewer, uses that viewer instead. A list of all draw commands can be found in the DRAW part of the ST_GRAPH block bootstrap number-of-runs; - Compute bootstrap support ps [filename]; - Save the current splits graph in PostScript shell command; - Execute a shell command help [name]; - Provides online help on the named subject title name; - Give current document a new title info; - Show general information about current document version; - Show version of program thanks; - Show thanks quit; - Quit program File Format =========== SplitsTree is based on the new nexus format (Maddison et al. 1995), which was originally developed for the programs PAUP (Swofford 1997) and MacClade (Maddison and Maddison1989). Input data is described in the three standard block types: taxa, characters and distances. More precisely, an input file will typi- cally consist of a taxa block listing the names of the given taxa and either a characters block containing a set of e.g. DNA-, RNA-, pro- tein-, or RFLP sequences, or a distances block containing a distance or dissimilarity matrix. In Figure 2 we describe the syntax of these blocks and in Figure 3 an example input file is given. An output file typically contains a number of additional blocks that are computed by SplitsTree and are specific to the program. The names of such blocks all have the prefix "st_". The st_splits, st_graph and st_assumptions block contain the split system, the splits graph and the assumptions made, respectively. More pre- cisely, the latter block describes how the data was processed, e.g. whether sites were excluded, which distances transformation was applied, and which method was used to compute the splits, in other words, which items from the Options and Method menus were in effect. Additionally, the program will generate a st_spectra block, if spectral analysis was used, a st_bootstrap block, if bootstrapping was applied, or an st_extras block, if one of the additional compu- tations offered by the program was employed. As mentioned above, the program offers an on-line description of the syntax of all blocks that it understands. SplitsTree can interpret the following Nexus blocks. TAXA CHARACTERS DISTANCES ST_SPECTRA ST_SPLITS ST_GRAPH ST_ASSUMPTIONS ST_COMMANDS ST_EXTRAS ST_BOOTSTRAP Here is a brief listing of their syntax: The taxa block contains a list of all taxon names: BEGIN TAXA; DIMENSIONS NTAX=number-of-taxa; TAXLABELS taxon_1 taxon_2 ... taxon_ntax; END; The characters block is used to specify sequence data: BEGIN CHARACTERS; DIMENSIONS [NTAX=number-of-taxa] NCHAR=number-of-characters; [FORMAT [DATATYPE={STANDARD|DNA|RNA|PROTEIN}] [RESPECTCASE] [MISSING=symbol] [GAP=symbol] [SYMBOLS="symbol symbol ..."] [[NO] LABELS] [[NO] TRANSPOSE] [[NO] INTERLEAVE] [[NO] TOKENS] ;] [CHARWEIGHTS wgt_1 wgt_2 ... wgt_nchar;] MATRIX sequence data in specified format ; END; The distances block is used to specify distance data: BEGIN DISTANCES; [DIMENSIONS [NTAX=number-of-taxa] [NCHAR=number-of-chars];] [FORMAT [TRIANGLE={LOWER|UPPER|BOTH}] [[NO] DIAGONAL] [[NO] LABELS] [MISSING=symbol] ;] [[NO] FORCE_METRIC [:OFFSET=number];]; MATRIX distance data in specified format ; END; The st_assumptions block is used to control the program and to determine how the data is treated and which algorithms are applied. In interactive mode, you change settings in the st_assumptions block using the assume command. For example, to exclude all gap sites from the analysis, type assume exclude gaps; BEGIN ST_ASSUMPTIONS; [DIMENSIONS [NTAX=number-of-taxa] [NCHAR=number-of-characters];] [CONSISTENT;] [OPTIONS GAPMODE={MISSING|NEWSTATE};] [[NO] GREEDY_TREE;] [[NO] REDUCE_GRAPH;] [EXCLUDE [[NO] MISSING] [[NO] GAPS] [[NO] NONPARSIMONY] [[NO] CONSTANT [number]] [CODON1] [CODON2] [CODON3] [ADD0|ADD1|ADD2];] [EXSET [exset-name] = {characters|NONE};] [EXTAXA [taxset-name] = {taxa|NONE};] [TAXA_PARTITION [name] [= {[taxon ... taxon,] [taxon ... taxon,]...]|NONE};] [METHOD={DSPLITS|PSPLITS|SPECTRAL|WCSPLITS| DTREE|DTREE|PTREE|STREE|WCTREE|NONE};] [[NO] REFINE [N=number] [CUT=number];] [THRESHOLD=non-negative-number;] [TRANSFORM={HAMMING|LOGDET|JC|K3ST|NEI_MILLER|GAPDIST|USERMATRIX|PAM250 |COVARION|FITCHSIDOW|CCONST [number]|NONE};] [USERMATRIX nstates user defined distance-matrix;] END; Spectral analysis will produce a st_spectra block: BEGIN ST_SPECTRA; [DIMENSIONS [NTAX=number-of-taxa] [NSPECT=number-of-spectra];] [FORMAT [[NO] LABELS] [[NO] WEIGHTS] ;] [THRESHOLD = non-negative-number]; [PARAMETERS para_1 para_2 ... para_nspect;] MATRIX 1 [label_0] [weight_0] [label_1] [weight_1] ... [label_n_1] [weight_n_1] ; MATRIX 2 [label_0] [weight_0] [label_1] [weight_1] ... [label_n_2] [weight_n_2] ; ... MATRIX nspect [label_0] [weight_0] [label_1] [weight_1] ... [label_n_nspect] [weight_n_nspect] ; END; The splits are listed in the st_splits block: BEGIN ST_SPLITS; [DIMENSIONS [NTAX=number-of-taxa] [NSPLITS=number-of-splits];] [FORMAT [[NO] LABELS] [[NO] WEIGHTS] [[NO] SPECTRUM] ;] [THRESHOLD=non-negative-number]; [PROPERTIES [FIT=non-negative-number] [[{NON|WEAKLY}] COMPATIBLE] [[NON] CYCLIC] ;] [CYCLE [taxon_i_1 taxon_i_2 ... taxon_i_ntax];] [SPLITSLABELS label_1 label_2 ... label_nsplits;] MATRIX [label_1] [weight_1] [split_1,] [label_2] [weight_2] [split_2,] ... [label_nsplits] [weight_nsplits] [split_nsplits,] ; END; This contains a description of the graph, don't try editing this block, as the results are unpredictable: BEGIN ST_GRAPH; DIMENSIONS NTAX=number-of-taxa NSPLITS=number-of-splits NVERTICES=number-of-vertices NEDGES=number-of-edges;] [FORMAT [[NO] LABELS] [[NO] ANGLES] ;] [DRAW [{TO_SCALE|EQUAL_EDGES}] [MARK=split-number] [TAXLABELS={NAMES|NUMBERS|BOTH|NONE}] [SPLITLABELS={VALUES|BOOTSTRAPS|NUMBERS|NONE}] [ANGLE split-number new-angle] [HOFFSET=horizontal-offset] [VOFFSET=vertical-offset] [HFLIP=horizontal-flip] [VFLIP=vertical-flip] [ROTATE=rotation] [ZOOM={zoom-factor|AUTO}] ;] [TAXLABELOFFSET [taxon_1 x y, taxon_2 x y, ... taxon_ntax x y,] ;] [TRANSLATE [vertex_1 taxon_1, vertex_2 taxon_2, ... vertex_ntax taxon_ntax,] ;] VERTICES [label_1] x_1 y_1, [label_2] x_2 y_2, ... [label_nvertices] x_nvertices y_nvertices, ; EDGES [label_1] (v_11 w_11) (v_12 w_12) ... , [label_2] (v_21 w_21) (v_22 w_22) ... , ... [label_nsplits] (v_nsplits1 w_nsplits1) (v_nsplits2 w_nsplits2) ..., ; END; Additionally, bootstrapping will produce a block of the following form. BEGIN ST_BOOTSTRAP; [DIMENSIONS [ntax=number-of-taxa] [nchar=number-of-characters] [nsplits=number-of-splits];] [FORMAT [[NO] LABELS] [[NO] SPLITS] [[NO] ALL];] [RUNS=number-of-runs;] [LENGTH={sample-length|SAME};] [SEED=random-number-seed;] [MATRIX [1] value_1 [split_1], [2] value_2 [split_2], ... [nsplits] [value_nsplits] [split_nsplits], [nsplits+1] [value_(nsplits+1)] [split_(nsplits+1)], ... [n] [value_n] [split_n], ;] END; Finally, the program contains a couple of further useful calculations in the st_extras block: BEGIN ST_EXTRAS; [CLEAR;] [DIVERGE taxa-set-1, taxa-set-2 [: DIST=number SD=number N=number];] [TEST_CLOCK (taxon1 taxon-2) taxon-3 [: DELTA=number SD=number N=number];] *** Reference: M. Steel, A. Cooper and D. Penny (1996) [SUPPORT taxa-set [DEVIATION=number] [CONVERGENCE=number] [GAPMODE={JOKER|INGROUP_JOKER|OUTGROUP_JOKER|NEWSTATE}] [: INGROUP number POSITIONS= set-of-positions OUTGROUP number POSITIONS= set-of-positions] ;] [ESTIMATE [VARIABLE [N=number] [: MIN=number MEAN=number SD=number]];] END; Example Input File ================== Typically, either a characters block or a distances block will be specified, but not both. The first token in a file must be "#NEXUS" and the first block must be the taxa block. Comments are enclosed in square brackets and all comments between the "#NEXUS" and "BEGIN taxa" tokens are passed on to the output file by SplitsTree. ----------- #NEXUS [The first token in the file must be #NEXUS] [! Comments come in square brackets, a comment starting with "!" is printed when read] BEGIN taxa; DIMENSIONS ntax=8; TAXLABELS Tobacco Rice Marchantia Chlamy Chlorella Euglena An_nidul Olithodiscus; END; BEGIN characters; DIMENSIONS nchar=920; FORMAT datatype=RNA gap=- labels interleave; MATRIX Tobacco AAGAACCUGCCCUUGGGAGCUGGAAACGGCUGCUAAUACCCC Rice AAGAACCUGCCCUUGGGAACUGGAAACGGUUGCUAAUACCCC Marchantia AAGAACCUGCCCUUGGGAGCUGGAAACGGUUGCUAAUACCCC Chlamy AAGAACCUACCUAUCGGAUUGGGAAACUGUUGCUAAUACCCC Chlorella AAGAACCUACCUUUAGGAACUGGAAACGGUUGCUAAUACCCC Euglena AAGAAUCUGCGCUUGGGAGAUGGAAACGUUUGCUAAUGCCUC An_nidul GAGAAUCUGCCUACAGGAGUUGGAAACGACUGCUAAUACCCG Olithodiscus GAGAAUCUGCCUUUAGGAUUUGGAAACGAAUGCUAAUACCUU ... Tobacco UCGCUAGUAAUCGCCGGUCAGAUACGGCGGUGAAUUCG Rice UCGCUAGUAAUCGCCGGUCAGAUACGGCGGUGAAUCCG Marchantia UCGCUAGUAAUCGCCGGUCAGAUACGGCGGUGAAUCCG Chlamy UCGCUAGUAAUCGCCAGUCAGAUAUGGCGGUGAAUACG Chlorella UCGCUAGUAAUCGCUGGUCAGAUACAGCGGUGAAUACG Euglena UCGCUAGUAAUCGCCGGUCAGAUACGGCGGUGAAUACG An_nidul UCGCUAGUAAUCGCAGGUCAGAUACUGCGGUGAAUACG Olithodiscus UCGCUAGUAAUCGCUGGUCAGACACAGCGGUGAAU-CG ; END; BEGIN distances; FORMAT triangle=LOWER diagonal labels; MATRIX Tobacco 0 Rice 0.02815 0 Marchantia 0.03241 0.04554 0 Chlamy 0.12904 0.13856 0.11065 0 Chlorella 0.08754 0.09909 0.06990 0.11317 0 Euglena 0.15235 0.16207 0.13663 0.16205 0.12710 0 An_nidul 0.14353 0.15509 0.14003 0.16569 0.13724 0.17907 0 Olithodiscus 0.16007 0.16695 0.15076 0.18069 0.12851 0.15286 0.15208 0 ; END; --------------- References ========== Adachi, J. and Hasegawa M. (1995) Improved Dating of the Human/Chimpanzee Separation in the Mitochondrial DNA tree: Heterogeneity Among Amino Acid Sites, J. Mol. Evol. 40: 622-628. Bandelt, H.-J. and Dress, A.W.M. (1992) A Canonical Decomposition Theory for Metrics on a Finite Set, Adv. Math.92:47-05. Bandelt, H.-J. and Dress, A.W.M. (1992b) Split Decomposition: A New and Useful Approach to Phylogenetic Analysis of Distance Data, Molecular Phylogenetics and Evolution, 1(3):242-252. Bandelt, H.-J. and Dress, A.W.M. (1993) A Relational Approach to Split Decomposition, in: O. Opitz, B. Lausen and R. Klar eds., Information and Classification, Springer Berlin, pp.123-131. Dayhoff, M.O., Barker, W.C. and Hunt, L.T. (1983) Establishing Homologies in Protein Sequences, in: C.H.W. Hirs and S.N. Timasheff, eds, Enzyme Structure, Methods in Enzymology 91:524-545. Delwiche, C.F., Kushel, M., and Palmer, J.D. (1995) Phylogenetic Analysis of tufA Sequences Indicates a Cyanobacterial Origin of All Plastids. Mol. Phyl. Evol. 4:110-128. Dopazo, J., Dress, A.W.M., and von Haeseler, A. (1993) Split Decomposition: a New Technique to Analyse Viral Evolution, PNAS 90:10320-10324. Dress, A.W.M., Huson, D.H and Moulton, V. (1996) Analyzing and Visualizing Sequence and Distance Data Using SplitsTree, Discrete Applied Math. 71:95--109 Dress, A.W.M. and Wetzel, R. (1993) The Human Organism - A Place to Trive for the Immuno-Deficiency Virus, in: Proceedings of IFCS, Paris. Fitch, W. (1971) Towards Defining the Course of Evolution: Minimum Change for a Specific Tree Topology, Syst. Zool. 20:406- 416. Hendy, M.D. and Penny, D. (1992) Spectral Analysis of Phylogenetic Data, J. Classif. 10:5-24. Huson, D.H. (1998) SplitsTree: A Program for Analyzing and Visualizing Evolutionary Data, Bioinformatics, 14(10):68-73. Jukes, T.H. and Cantor, C.R. (1969) Evolution of Protein Molecules, in: H.N. Munro, ed., Mammalian Protein Metabolism, Academic Press, New York, pp.21-132. Kimura, M. (1981) Estimation of Evolutionary Distances Between Homologous Nucleotide Sequences., Proc. Natl. Acad. Sci. USA 78:454-458. Lockhart, P.J., Steel, M.A., Hendy, M.D. and Penny, D. P (1994) Recovering An Evolutionary tree Under a More Realistic Model of Lockhart, P.J., Penny, D. and Meyer, A. (1995) Testing the Phylogeny of Swordtail Fishes Using Split Decomposition and Spectral Analysis, Molecular Evolution 41:666-674. Lockhart, P.J., Larkum, A.W.D., Steel, M.A., Waddell, P.J. and Penny, D. (1996) Evolution of Chlorophyll and Bacteriochlorophyll: The Problem of Invariant Sites in Sequence Analysis. Proc. Natl. Acad. Sci. USA 93:1930-1934 Maddision, D.R., Swofford, D.L. and W.P. Maddision (1995) NEXUS: An Extendible File Format for Systematic Information. Systematic Biology (in press). Maddison, W.P, and Maddison, D.R. (1989) Interactive Analysis of phylogeny and Character Evolution using the Computer Program MacClade. Folia Primatologica. 53:190-202. McLenachan, P.A., Lockhart, P.J., Faber, H.R., and Mansfield, B.C. (1996) Evolutionary Analysis of the Multigene Pregnancy Specific b1-Glyocoprotein Family: Separation of Historical and Non Historical Signals, Molecular Evolution 42:273-280. Moulton, V. Steel, M.A. and Tuffley, C. (1997) Dissimilarity Maps and Substitution Models: Some New Results. Proceedings of the DIMACS Workshop on Mathematical Hierarchies and Biology (in press; American Mathematical Society). M. Nei, M. and Miller, J.C. (1990) A Simple Method For Estimating Average Number of Nucleotide Substitutions Within and Between Populations From Restriction Data, Genetics 1256:873-879. Sidow, A., Nguyen, T. and Speed, T.P. (1992) Estimating the Fraction of Invariable Codons With a Capture-Recapture Method, J. Mol. Evol. 35:253-260. Swofford, D.L. (1997) PAUP 5.0, Sinaur Associates, Massachusetts. Swofford, D.L., Olsen, G.J., Waddell, P.J. and Hillis, D.M (1996) Phylogenetic Inference, in: D. M. Hillis, C. Moritz and B.K. Mable eds., Molecular Systematics, 2nd ed., Sinauer Associates, Sunderland, Massachusetts, 407-514. Steel, M.A. (1994) Recovering a Tree From the Leaf Colorations it Generates Under a Markov Model, Appl. Math. Lett. 7(2):19-24. Szekely, L.A., Steel, M.A. and Erdos, P. (1993) Fourier calculus on evolutionary trees, Adv. Appl. Math. 14:200-216. Van de Peer, Y., Rensing, S.A., Maier, U.G, and De Wachter, R. (1996) Subsitution Rate Calibration of Small Ribosomal Subunit RNA Identifies Chlorachniophyte Endosymbionts as Remnants of Green Algae, Proc. Natl. Acad. Sci. USA 93:7732-7736. Waddell, P. J. (1996) Statistical Methods of Phylogenetic Analysis, Including Hadamard Conjugations, LogDet Transforms, and Maximum Likelihood. PhD thesis, Massey University, New Zealand. Wetzel, R. (1995) Zur Visualisierung abstrakter ~@hnlichkeitsÐ beziehungen, PhD thesis, University of Bielefeld. Acknowledgments =============== SplitsTree was developed within the framework of a joint coopera- tion between researchers at Bielefeld University (Germany), Massey University (Palmerston North, New Zealand) and the University of Canterbury (Christchurch, NZ) with support from the German Ministry of Science and Technology (BMFT), the New Zealand Marsden Fund and the University of Canterbury. Thanks to the following people for their support and cooperation: Hans-J~_rgen Bandelt, Andreas Dress, Mike Hendy, Pete Lockhart, Holger Paschke, Dave Penny, Mike Steel, Udo T~Znges and Rainer Wetzel. The Example section of this paper was written with the help of Pete Lockhart, who also suggested many improvements of the program and this paper. -------------------------- Daniel Huson, 8 March 1998 huson@mathematik.uni-bielefeld.de huson@math.princeton.edu