SplitsTree: A Program for Analyzing and Visualizing Evolutionary Data
=====================================================================

Overview
========

Based on the split decomposition method due to
H.-J. Bandelt and A.W.M. Dress, it takes as input a distance matrix or
a set of aligned sequences and produces as output a graph that represents
the evolutionary relationships between the taxa. For ideal
data, this graph is a tree, whereas less ideal data gives rise to a
more or less tree-like network that can be interpreted as possible
evidence for different and conflicting phylogenies. The program
supports a number of distances transformations, the computation of
parsimony splits, spectral analysis and bootstrapping. After a review
of the concepts of splits, split graphs and the method of split
decomposition, this file describes the SplitsTree program in detail.

Introduction
============

Evolutionary relationships between taxa are most often represented
as phylogenetic trees, and many different algorithms for tree con-
struction have been developed (Swofford et al. 1996). This is, of
course, justified by the assumption that evolution is a branching or
tree-like process. However, a set of real data often contains a num-
ber of different and sometimes conflicting signals and thus does not
always clearly support an unique tree.

To address this problem, H.-J. Bandelt and A.W.M. Dress (1992) de-
veloped the method of split decomposition. In contrast to methods
such as maximum parsimony and maximum likelihood that recon-
struct phylogenetic trees by optimizing parameters, split decom-
position is a transformation-based approach. Essentially, evolution-
ary data is transformed or, more precisely, "canonically decom-
posed", into a sum of "weakly compatible splits" and then repre-
sented by a so-called splits graph. For ideal data, this is a tree,
whereas less ideal data will give rise to a tree-like network that can
be interpreted as possible evidence for different and conflicting
phylogenies. Further, as split decomposition does not attempt to
force data onto a tree, it can provide a good indication of how tree-
like given data is.

There exist efficient algorithms for performing split decomposition
(Bandelt and Dress 1992) and for computing splits graphs (Wetzel
1995 and D.H. Huson, in preparation). A.W.M. Dress and R. Wetzel
produced a simple implementation of split decomposition (Wetzel
1995) as a investigative tool to help develop the general theory.
Based on their work, a first public version was developed by R.
Wetzel and the author (SplitsTree version 1). The program de-
scribed in this paper (SplitsTree version 2) is a completely new
implementation that was recently produced by the author.

In this file we first review the concepts of splits, splits graphs
and the method of split decomposition, and then discuss the
SplitsTree program in detail. For a number of biological applica-
tions of the split decomposition method, see e.g.: Bandelt and Dress
1992b, Dopazo et al. 1993, Dress and Wetzel 1993, Lockhart et al,
1995, Wetzel 1995, Dress et al. 1996, McLenachan et al. 1996, or P.J.
Lockhart et al., in preparation.

Splits and Splits Graphs
========================

Evolutionary relationships are generally represented by a phyloge-
netic tree, T, that is, a tree whose leaves are labeled by a set X of
taxa and whose remaining vertices are unlabeled and of degree at
least 3. (We only consider unrooted trees in this paper.) Any edge e
of T defines a split S={A,A'} of X, i.e. a partitioning of X into two
non-empty sets A and A', consisting of all taxa on the one side, or
the other, of e in T. Such a system S of splits is called compatible, if
for any two splits S1={A1,A'1} and S2={A2,A'2} in S, one of the four
intersections

A1 n A2, A1 n A'2,  A'1 n A2, or A'1n A'2

is empty. Any phylogenetic tree T gives rise to a compatible split
system S . In 1971, P. Buneman established that, vice versa, any
compatible split system S corresponds to a unique phylogenetic tree
T (Buneman 1971). So, tree reconstruction for a given set of taxa X
is equivalent to computing a compatible system of splits S for X and
determining a weight for each split S that corresponds to the length
of the associated edge.

Hence, to obtain more general graphs, one must consider less re-
stricted systems of splits. Let X be a set of taxa. A system of splits S
of X is called weakly compatible, if for any three splits S1, S2, S3 and
all Ai in Si (i=1,2,3), at least one of the four intersections

A1 n A2 n A3, A1 n A'2 n A'3, A'1 n A2 n A'3, or A'1 n A'2 n A3

is empty (Bandelt and Dress 1992). So, in particular, any two splits
are permitted to be incompatible. Intermediately, S is called circu-
lar, if there exists an ordering x1,x2,...,xm of the taxa such that for
every split SÎS there exists AÎS with A={xp,xp+1,...,xq} and1£p£q£m.
One can prove that a circular split system is always weakly compat-
ible and a compatible split system is always circular.

A splits graph representing a weakly compatible split system S is a
graph G(S)=(V,E) whose vertices vÎV are labeled by the set of taxa
X and whose edges eÎE are straight line segments that represent
the splits in S, see Figure 1. More precisely, each split S={A,A'}ÎS is
represented by a band of parallel edges of equal length in such a
way that deleting all edges in such a band partitions the graph into
precisely two components, one containing all vertices labeled by
taxa in A and the other containing all vertices labeled by taxa in A'.
The length of the edges representing a given split S indicates its
weight or support and is calculated as the isolation index of S. For
algorithms that compute splits graphs, see (Wetzel 1995 and D.H.
Huson, in preparation).

Consider a weakly compatible system of splits S of a set X of taxa. If
S is compatible, then G(S) is a phylogenetic tree. What is the situa-
tion if S is merely circular? Then G(S) can be realized as a planar
graph (Wetzel 1995 and D.H. Huson, in preparation). Finally, if S is
not circular, then in general G(S) will not be planar. In biological
applications, the arising split systems are often either circular or
mildly non-circular.

Split Decomposition
===================

Split decomposition is a method for obtaining a system of weakly
compatible splits with weights from a given set of evolutionary
data. So, assume we are given a set of taxa X and a distance map
d:X*X -> R>=0 on X, i.e. a matrix representing the evolutionary dis-
tances between pairs of taxa. H.-J. Bandelt and A.W.M. Dress (1992)
showed that such a distance map d has the following unique de-
composition:
d=S aS×dS + d0.

Here, we sum over all possible splits S of X; the map dS:X*X -> R>=0 is
the split metric  on S that equals 1, if x and y lie on different sides
of S, and 0, otherwise; the number aS ³ 0 is the weight or isolation-
index of the split S; and the map d0:X*X -> R>=0 is the so-called split
prime residue and cannot be decomposed further. A split S with aS
>0 is called a d-split, and the system S of all d-splits is weakly com-
patible and can be computed efficiently, see (Bandelt and Dress
1992) for details.

If there is no split prime residue, then the distance between any
two taxa x and y is precisely equal to the sum of weights of all d-
splits that separate x and y, and thus proportional to the sum of all
edge lengths along a shortest path from x to y in the splits graph.
However, in general, the split prime residue will be positive and so
the sum of weights will only give an approximation (from below) of
the original distances. The fit of the approximation is measured by
the sum of all approximated distances divided by the sum of all
original distances. In biological applications, the fit is often quite
high and a small split prime residue can be considered as "noise".

If we are given a set of aligned sequences, then to apply split de-
composition we must first compute a distance matrix d using an ap-
propriate distance transformation. Alternatively, one can compute
the so-called parsimony splits, or p-splits, directly from the se-
quences, as described in (Bandelt and Dress 1993). Yet another
possibility is to use spectral analysis (Hendy and Penny 1993,
Szekely et al. 1993 and M.D. Hendy and P.J. Waddell, in preparation)
to assign a weight (the
so-called g-value) to each possible split of X. One can then greedily
extract a weakly compatible (or compatible) system of splits, i.e. by
considering all such splits S in decreasing order of weight and in-
serting the split S into S if it is weakly compatible (or compatible)
with all splits already in S.

Description of SplitsTree
=========================

SplitsTree is a command line driven program for analyzing and visualizing
evolutionary data. It takes as input a file containing sequences, distances,
or a system of splits and produces as output a weakly compatible system
of splits and a splits graph representing the given data. It contains a
number of transformations to obtain distances from sequences and methods
for obtaining compatible or weakly compatible split systems from
distances or sequences.

In most cases, the user will open a file using the
open file; command and then use the assume command to specify which
caculations are to be peformed. Then, the draw command will be typed
to see a picture of the computed splits graph, the show command
will be used to inspect the different blocks of data computed
and then perhaps the save filename; command will be used to
save some of the data.

List of Commands
================

The program offers the following commands:
    open filename; save [blocks] filename; show [{blocks|keywords}];
    assume assumption; draw [geometry commands];
    bootstrap number-of-runs;
    ps [filename]; shell {command|"commands"};
    help [name]; title name; info; version, thanks, close; quit;
    import REAP filename;

open filename;
- Read input from a file in NEXUS format

save [blocks] filename;
- Save blocks to a file. If no blocks listed, saves all valid blocks

show [blocks];
- Show valid blocks on screen

show {TREE | TREETOP};
- Show tree (or tree topology) in Newick's 8:45 standard format

assume assumption;
- Enter an assumption, i.e. any legal ST_ASSUMPTIONS-block command.
This is the main command used to control which computations are performed

draw [draw-commands];
- Draw the current splitsgraph using ghostview.
If the environment variable SPLITSVIEWER is set to a PS viewer, uses that
viewer instead.
A list of all draw commands can be found in the DRAW part of the ST_GRAPH block

bootstrap number-of-runs;
- Compute bootstrap support

ps [filename];
- Save the current splits graph in PostScript

shell command;
- Execute a shell command

help [name];
- Provides online help on the named subject

title name;
- Give current document a new title

info;
- Show general information about current document

version;
- Show version of program

thanks;
- Show thanks

quit;
- Quit program

File Format
===========

SplitsTree is based on the new nexus format (Maddison et al.
1995), which was originally developed for the programs PAUP
(Swofford 1997) and MacClade (Maddison and Maddison1989).
Input data is described in the three standard block types: taxa,
characters and distances.  More precisely, an input file will typi-
cally consist of a taxa block listing the names of the given taxa and
either a characters block containing a set of e.g. DNA-, RNA-, pro-
tein-, or RFLP sequences, or a distances block containing a distance
or dissimilarity matrix. In Figure 2 we describe the syntax of these
blocks and in Figure 3 an example input file is given.

An output file typically contains a number of additional blocks that
are computed by SplitsTree and are specific to the program. The
names of such blocks all have the prefix "st_". The st_splits,
st_graph and st_assumptions block contain the split system, the
splits graph and the assumptions made, respectively. More pre-
cisely, the latter block describes how the data was processed, e.g.
whether sites were excluded, which distances transformation was
applied, and which method was used to compute the splits,  in other
words, which items from the Options and Method menus were in
effect.

Additionally, the program will generate a st_spectra block, if
spectral analysis was used, a st_bootstrap block, if bootstrapping
was applied, or an st_extras block, if one of the additional compu-
tations offered by the program was employed. As mentioned above,
the program offers an on-line description of the syntax of all blocks
that it understands.

SplitsTree can interpret the following Nexus blocks.

	TAXA CHARACTERS DISTANCES ST_SPECTRA ST_SPLITS ST_GRAPH
	ST_ASSUMPTIONS ST_COMMANDS ST_EXTRAS ST_BOOTSTRAP

Here is a brief listing of their syntax:

The taxa block contains a list of all taxon names:

BEGIN TAXA;
    DIMENSIONS NTAX=number-of-taxa;
    TAXLABELS taxon_1 taxon_2 ... taxon_ntax;
END;

The characters block is used to specify sequence data:

BEGIN CHARACTERS;
    DIMENSIONS [NTAX=number-of-taxa] NCHAR=number-of-characters;
    [FORMAT
        [DATATYPE={STANDARD|DNA|RNA|PROTEIN}]
        [RESPECTCASE]
        [MISSING=symbol]
        [GAP=symbol]
        [SYMBOLS="symbol symbol ..."]
        [[NO] LABELS]
        [[NO] TRANSPOSE]
        [[NO] INTERLEAVE]
        [[NO] TOKENS]
    ;]
    [CHARWEIGHTS wgt_1 wgt_2 ... wgt_nchar;]
    MATRIX
        sequence data in specified format
    ;
END;

The distances block is used to specify distance data:

BEGIN DISTANCES;
    [DIMENSIONS [NTAX=number-of-taxa] [NCHAR=number-of-chars];]
    [FORMAT
        [TRIANGLE={LOWER|UPPER|BOTH}]
        [[NO] DIAGONAL]
        [[NO] LABELS]
        [MISSING=symbol]
    ;]
    [[NO] FORCE_METRIC [:OFFSET=number];];
    MATRIX
        distance data in specified format
    ;
END;

The st_assumptions block is used to control the program and to determine
how the data is treated and which algorithms are applied.
In interactive mode, you change settings in the st_assumptions block
using the assume command. For example, to exclude all gap sites from
the analysis, type   assume exclude gaps;

BEGIN ST_ASSUMPTIONS;
    [DIMENSIONS [NTAX=number-of-taxa] [NCHAR=number-of-characters];]
    [CONSISTENT;]
    [OPTIONS GAPMODE={MISSING|NEWSTATE};]
    [[NO] GREEDY_TREE;]
    [[NO] REDUCE_GRAPH;]
    [EXCLUDE [[NO] MISSING] [[NO] GAPS] [[NO] NONPARSIMONY]
				[[NO] CONSTANT [number]]
     [CODON1] [CODON2] [CODON3] [ADD0|ADD1|ADD2];]
    [EXSET [exset-name] =  {characters|NONE};]
    [EXTAXA [taxset-name] = {taxa|NONE};]
    [TAXA_PARTITION [name] [= {[taxon ... taxon,] [taxon ... taxon,]...]|NONE};]
    [METHOD={DSPLITS|PSPLITS|SPECTRAL|WCSPLITS|
              DTREE|DTREE|PTREE|STREE|WCTREE|NONE};]
    [[NO] REFINE [N=number] [CUT=number];]
    [THRESHOLD=non-negative-number;]
    [TRANSFORM={HAMMING|LOGDET|JC|K3ST|NEI_MILLER|GAPDIST|USERMATRIX|PAM250
    |COVARION|FITCHSIDOW|CCONST [number]|NONE};]
    [USERMATRIX nstates
        user defined distance-matrix;]
END;

Spectral analysis will produce a st_spectra block:

BEGIN ST_SPECTRA;
    [DIMENSIONS [NTAX=number-of-taxa] [NSPECT=number-of-spectra];]
    [FORMAT
        [[NO] LABELS]
        [[NO] WEIGHTS]
    ;]
    [THRESHOLD = non-negative-number];
    [PARAMETERS para_1 para_2 ... para_nspect;]
    MATRIX 1
        [label_0] [weight_0]
        [label_1] [weight_1]
        ...
        [label_n_1] [weight_n_1]
    ;
    MATRIX 2
        [label_0] [weight_0]
        [label_1] [weight_1]
        ...
        [label_n_2] [weight_n_2]
    ;
    ...
    MATRIX nspect
        [label_0] [weight_0]
        [label_1] [weight_1]
        ...
        [label_n_nspect] [weight_n_nspect]
    ;
END;

The splits are listed in the st_splits block:

BEGIN ST_SPLITS;
    [DIMENSIONS [NTAX=number-of-taxa] [NSPLITS=number-of-splits];]
    [FORMAT
        [[NO] LABELS]
        [[NO] WEIGHTS]
        [[NO] SPECTRUM]
    ;]
    [THRESHOLD=non-negative-number];
    [PROPERTIES
        [FIT=non-negative-number]
        [[{NON|WEAKLY}] COMPATIBLE]
        [[NON] CYCLIC]
    ;]
    [CYCLE [taxon_i_1 taxon_i_2 ... taxon_i_ntax];]
    [SPLITSLABELS label_1 label_2 ... label_nsplits;]
    MATRIX
        [label_1] [weight_1] [split_1,]
        [label_2] [weight_2] [split_2,]
        ...
        [label_nsplits] [weight_nsplits] [split_nsplits,]
    ;
END;

This contains a description of the graph, don't try editing
this block, as the results are unpredictable:

BEGIN ST_GRAPH;
    DIMENSIONS NTAX=number-of-taxa NSPLITS=number-of-splits
        NVERTICES=number-of-vertices NEDGES=number-of-edges;]
    [FORMAT
        [[NO] LABELS]
        [[NO] ANGLES]
    ;]
    [DRAW
        [{TO_SCALE|EQUAL_EDGES}]
        [MARK=split-number]
        [TAXLABELS={NAMES|NUMBERS|BOTH|NONE}]
        [SPLITLABELS={VALUES|BOOTSTRAPS|NUMBERS|NONE}]
        [ANGLE split-number new-angle]
        [HOFFSET=horizontal-offset]
        [VOFFSET=vertical-offset]
        [HFLIP=horizontal-flip]
        [VFLIP=vertical-flip]
        [ROTATE=rotation]
        [ZOOM={zoom-factor|AUTO}]
    ;]
    [TAXLABELOFFSET
        [taxon_1 x y,
        taxon_2 x y,
        ...
        taxon_ntax x y,]
    ;]
    [TRANSLATE
        [vertex_1 taxon_1,
        vertex_2 taxon_2,
        ...
        vertex_ntax taxon_ntax,]
    ;]
    VERTICES
         [label_1] x_1 y_1,
        [label_2] x_2 y_2,
         ...
         [label_nvertices] x_nvertices y_nvertices,
    ;
    EDGES
        [label_1] (v_11 w_11) (v_12 w_12) ... ,
        [label_2] (v_21 w_21) (v_22 w_22) ... ,
        ...
        [label_nsplits] (v_nsplits1 w_nsplits1) (v_nsplits2 w_nsplits2) ...,
    ;
END;

Additionally, bootstrapping will produce a block of the following form.

BEGIN ST_BOOTSTRAP;
    [DIMENSIONS [ntax=number-of-taxa] [nchar=number-of-characters]
                [nsplits=number-of-splits];]
    [FORMAT [[NO] LABELS] [[NO] SPLITS] [[NO] ALL];]
    [RUNS=number-of-runs;]
    [LENGTH={sample-length|SAME};]
    [SEED=random-number-seed;]
    [MATRIX
        [1] value_1 [split_1],
        [2] value_2 [split_2],
        ...
        [nsplits] [value_nsplits] [split_nsplits],
        [nsplits+1] [value_(nsplits+1)] [split_(nsplits+1)],
        ...
        [n] [value_n] [split_n],
    ;]
END;

Finally, the program contains a couple of further useful calculations
in the st_extras block:

BEGIN ST_EXTRAS;
    [CLEAR;]
    [DIVERGE taxa-set-1, taxa-set-2 [: DIST=number SD=number N=number];]
    [TEST_CLOCK (taxon1 taxon-2) taxon-3 [: DELTA=number SD=number N=number];]
    *** Reference: M. Steel, A. Cooper and D. Penny (1996)
    [SUPPORT taxa-set
        [DEVIATION=number]
        [CONVERGENCE=number]
        [GAPMODE={JOKER|INGROUP_JOKER|OUTGROUP_JOKER|NEWSTATE}]
      [: INGROUP number POSITIONS= set-of-positions
        OUTGROUP number POSITIONS= set-of-positions]
    ;]
    [ESTIMATE [VARIABLE [N=number] [: MIN=number MEAN=number SD=number]];]
END;

Example Input File
==================

Typically, either a characters block
or a distances block will be specified, but not both. The first token
in a file must be "#NEXUS" and the first block must be the taxa
block. Comments are enclosed in square brackets and all comments
between the "#NEXUS" and "BEGIN taxa" tokens are passed on to
the output file by SplitsTree.

-----------
#NEXUS [The first token in the file must be #NEXUS]
[! Comments come in square brackets, a comment starting
with "!" is printed when read]
BEGIN taxa;
    DIMENSIONS ntax=8;
TAXLABELS Tobacco Rice Marchantia Chlamy Chlorella Euglena
          An_nidul Olithodiscus;
END;

BEGIN characters;
    DIMENSIONS nchar=920;
    FORMAT datatype=RNA gap=- labels interleave;
    MATRIX
    Tobacco      AAGAACCUGCCCUUGGGAGCUGGAAACGGCUGCUAAUACCCC
    Rice         AAGAACCUGCCCUUGGGAACUGGAAACGGUUGCUAAUACCCC
    Marchantia   AAGAACCUGCCCUUGGGAGCUGGAAACGGUUGCUAAUACCCC
    Chlamy       AAGAACCUACCUAUCGGAUUGGGAAACUGUUGCUAAUACCCC
    Chlorella    AAGAACCUACCUUUAGGAACUGGAAACGGUUGCUAAUACCCC
    Euglena      AAGAAUCUGCGCUUGGGAGAUGGAAACGUUUGCUAAUGCCUC
    An_nidul     GAGAAUCUGCCUACAGGAGUUGGAAACGACUGCUAAUACCCG
    Olithodiscus GAGAAUCUGCCUUUAGGAUUUGGAAACGAAUGCUAAUACCUU
       ...
    Tobacco      UCGCUAGUAAUCGCCGGUCAGAUACGGCGGUGAAUUCG
    Rice         UCGCUAGUAAUCGCCGGUCAGAUACGGCGGUGAAUCCG
    Marchantia   UCGCUAGUAAUCGCCGGUCAGAUACGGCGGUGAAUCCG
    Chlamy       UCGCUAGUAAUCGCCAGUCAGAUAUGGCGGUGAAUACG
    Chlorella    UCGCUAGUAAUCGCUGGUCAGAUACAGCGGUGAAUACG
    Euglena      UCGCUAGUAAUCGCCGGUCAGAUACGGCGGUGAAUACG
    An_nidul     UCGCUAGUAAUCGCAGGUCAGAUACUGCGGUGAAUACG
    Olithodiscus UCGCUAGUAAUCGCUGGUCAGACACAGCGGUGAAU-CG
    ;
END;

BEGIN distances;
    FORMAT triangle=LOWER diagonal labels;
    MATRIX
    Tobacco      0
    Rice         0.02815 0
    Marchantia   0.03241 0.04554 0
    Chlamy       0.12904 0.13856 0.11065 0
    Chlorella    0.08754 0.09909 0.06990 0.11317 0
    Euglena      0.15235 0.16207 0.13663 0.16205 0.12710 0
    An_nidul     0.14353 0.15509 0.14003 0.16569 0.13724
                 0.17907 0
    Olithodiscus 0.16007 0.16695 0.15076 0.18069 0.12851
                 0.15286 0.15208 0
    ;
END;
---------------


References
==========

Adachi, J. and Hasegawa M. (1995) Improved Dating of the
Human/Chimpanzee Separation in the Mitochondrial DNA tree:
Heterogeneity Among Amino Acid Sites, J. Mol. Evol. 40: 622-628.

Bandelt, H.-J. and Dress, A.W.M. (1992) A Canonical Decomposition
Theory for Metrics on a Finite Set, Adv. Math.92:47-05.

Bandelt, H.-J. and Dress, A.W.M. (1992b) Split Decomposition: A New
and Useful Approach to Phylogenetic Analysis of Distance Data,
Molecular Phylogenetics and Evolution, 1(3):242-252.

Bandelt, H.-J. and Dress, A.W.M. (1993) A Relational Approach to
Split Decomposition, in: O. Opitz, B. Lausen and R. Klar eds.,
Information and Classification, Springer Berlin, pp.123-131.

Dayhoff, M.O., Barker, W.C. and Hunt, L.T. (1983) Establishing
Homologies in Protein Sequences, in: C.H.W. Hirs and S.N. Timasheff,
eds, Enzyme Structure, Methods in Enzymology 91:524-545.

Delwiche, C.F., Kushel, M., and Palmer, J.D. (1995) Phylogenetic
Analysis of tufA Sequences Indicates a Cyanobacterial Origin of All
Plastids. Mol. Phyl. Evol. 4:110-128.

Dopazo, J., Dress, A.W.M., and von Haeseler, A. (1993) Split
Decomposition: a New Technique to Analyse Viral Evolution, PNAS
90:10320-10324.

Dress, A.W.M., Huson, D.H and Moulton, V. (1996) Analyzing and
Visualizing Sequence and Distance Data Using SplitsTree,
Discrete Applied Math. 71:95--109

Dress, A.W.M. and Wetzel, R. (1993) The Human Organism - A Place
to Trive for the Immuno-Deficiency Virus, in: Proceedings of IFCS,
Paris.

Fitch, W. (1971) Towards Defining the Course of Evolution:
Minimum Change for a Specific Tree Topology, Syst. Zool. 20:406-
416.

Hendy, M.D. and Penny, D. (1992) Spectral Analysis of Phylogenetic
Data, J. Classif. 10:5-24.

Huson, D.H. (1998) SplitsTree: A Program for Analyzing and Visualizing
Evolutionary Data, Bioinformatics, 14(10):68-73.

Jukes, T.H. and Cantor, C.R. (1969) Evolution of Protein Molecules, in:
H.N. Munro, ed., Mammalian Protein Metabolism, Academic Press,
New York, pp.21-132.

Kimura, M. (1981) Estimation of Evolutionary Distances Between
Homologous Nucleotide Sequences., Proc. Natl. Acad. Sci. USA
78:454-458.

Lockhart, P.J., Steel, M.A., Hendy, M.D. and Penny, D. P (1994)
Recovering An Evolutionary tree Under a More Realistic Model of

Lockhart, P.J., Penny, D. and Meyer, A. (1995) Testing the Phylogeny
of Swordtail Fishes Using Split Decomposition and Spectral Analysis,
Molecular Evolution 41:666-674.

Lockhart, P.J., Larkum, A.W.D., Steel, M.A., Waddell, P.J. and Penny,
D. (1996) Evolution of Chlorophyll and Bacteriochlorophyll: The
Problem of Invariant Sites in Sequence Analysis. Proc. Natl. Acad.
Sci. USA 93:1930-1934

Maddision, D.R., Swofford, D.L. and W.P. Maddision (1995) NEXUS:
An Extendible File Format for Systematic Information. Systematic
Biology (in press).

Maddison, W.P, and Maddison, D.R.  (1989) Interactive Analysis of
phylogeny and Character Evolution using the Computer Program
MacClade. Folia Primatologica. 53:190-202.

McLenachan, P.A., Lockhart, P.J., Faber, H.R., and Mansfield, B.C.
(1996) Evolutionary Analysis of the Multigene Pregnancy Specific
b1-Glyocoprotein Family: Separation of Historical and Non Historical
Signals, Molecular Evolution 42:273-280.

Moulton, V. Steel, M.A. and Tuffley, C. (1997) Dissimilarity Maps
and Substitution Models: Some New Results. Proceedings of the
DIMACS Workshop on Mathematical Hierarchies and Biology (in
press; American Mathematical Society).

M. Nei, M. and Miller, J.C. (1990)  A Simple Method For Estimating
Average Number of Nucleotide Substitutions Within and Between
Populations From Restriction Data, Genetics 1256:873-879.

Sidow, A., Nguyen, T. and Speed, T.P. (1992) Estimating the Fraction
of Invariable Codons With a Capture-Recapture Method, J. Mol. Evol.
35:253-260.

Swofford, D.L. (1997) PAUP 5.0, Sinaur Associates, Massachusetts.

Swofford, D.L., Olsen, G.J., Waddell, P.J. and Hillis, D.M (1996)
Phylogenetic Inference, in: D. M. Hillis, C. Moritz and B.K. Mable eds.,
Molecular Systematics, 2nd ed., Sinauer Associates, Sunderland,
Massachusetts, 407-514.

Steel, M.A. (1994) Recovering a Tree From the Leaf Colorations it
Generates Under a Markov Model, Appl. Math. Lett. 7(2):19-24.

Szekely, L.A., Steel, M.A. and Erdos, P. (1993) Fourier calculus on
evolutionary trees, Adv. Appl. Math. 14:200-216.

Van de Peer, Y., Rensing, S.A., Maier, U.G, and De Wachter, R. (1996)
Subsitution Rate Calibration of Small Ribosomal Subunit RNA
Identifies Chlorachniophyte Endosymbionts as Remnants of Green
Algae, Proc. Natl. Acad. Sci. USA 93:7732-7736.

Waddell, P. J. (1996) Statistical Methods of Phylogenetic Analysis,
Including Hadamard Conjugations, LogDet Transforms, and
Maximum Likelihood. PhD thesis, Massey University, New Zealand.

Wetzel, R. (1995) Zur Visualisierung abstrakter ~@hnlichkeitsÐ
beziehungen, PhD thesis, University of Bielefeld.

Acknowledgments
===============

SplitsTree was developed within the framework of a joint coopera-
tion between researchers at Bielefeld University (Germany), Massey
University (Palmerston North, New Zealand) and the University of
Canterbury (Christchurch, NZ) with support from the German
Ministry of Science and Technology (BMFT), the New Zealand
Marsden Fund and the University of Canterbury.

Thanks to the following people for their support and cooperation:
Hans-J~_rgen Bandelt, Andreas Dress, Mike Hendy, Pete Lockhart,
Holger Paschke, Dave Penny, Mike Steel, Udo T~Znges and Rainer
Wetzel. The Example section of this paper was written with the help
of Pete Lockhart, who also suggested many improvements of the
program and this paper.

--------------------------
Daniel Huson, 8 March 1998
huson@mathematik.uni-bielefeld.de
huson@math.princeton.edu