The following is a collection of details about using AlignACE inspired by questions from users.
AlignACE is a command-line program. Make sure your path includes the directory where AlignACE is located. If you run AlignACE without options, it will return a list of all possible options. For example:
[jhughes@atlas]$ AlignACE
AlignACE 3.0 04/13/02
Usage: AlignACE -i seqfile (options)
Seqfile must be in FASTA format.
Options:
-numcols number of columns to align (10)
-nocols turns off column sampling
-expect number of sites expected in model (10)
-gcback background fractional GC content of input sequence (0.38)
-minpass minimum number of non-improved passes in phase 1 (200)
-seed set seed for random number generator (time)
-undersample possible sites / (expect * numcols * seedings) (1)
-oversample 1/undersample (1)
Output format:
column 1: site sequence
column 2: sequence number
column 3: position of site within sequence
column 4: strand of site (1=forward, 0=reverse)
The options correspond to the parameters described in the JMB paper.
The most important option is -gcback, which should be set to the GC content of the genome of interest or to the GC content of the sequences being searched. If the sequences being searched are much more AT-rich than indicated by -gcback, for example, the motifs returned will be disproportionately AT-rich as well.
The -oversample and -undersample options (use one or the other) may be used to perform a more or less exhaustive search, at the expense of computational time. Use the default setting to get a baseline, then modify accordingly using one or the other option.
The -seed option may be used to get exactly reproducible results.
The -nocols option may be used to get motifs of a prescribed width.
The other options are less likely to used, but are there for completion. See the JMB paper for a description.
A sample input file for AlignACE, GAL.seq, is included with the
download package. This is a FASTA-formatted file containing sequences
upstream of a number of galactose utilization genes in Saccharomyces
cerevisiae. To run AlignACE on these sequences, execute the following
command:
$ AlignACE -i GAL.seq > test.ace
The AlignACE output file contains a version number, a copy of the input command line, and the values for all relevant parameters. This is followed by a list of the names of the input sequences. The numbers associated with these names are used in the subsequent motif descriptions to refer to the input sequences. Motifs are then listed in order of descending MAP score, the metric for motif strength used by AlignACE (see JMB paper).
The fields in AlignACE output following Motif x are:
1: site sequence (*'s below indicate 'active' motif columns)
2: number of the sequence from which the site was found (listed at the top of the output file)
3: position of the site in that sequence (specifically, the position of the site column nearest the beginning of the input sequence)
4: strand (1=forward, 0=reverse)
As mentioned above, the *'s indicate active columns.
It is not possible to search for motifs on a single strand/direction of sequence. This is an often-requested option that is not currently supported.
This code was developed under Linux (RH 7.1) and compiled with gcc 2.96.
Questions and comments should be referred to Jason Hughes at jhughes@post.harvard.edu.