The gsuffix Library

Introduction

The gsuffix library contains C-language implementations of generalized suffix-based algorithms useful for searching for string patterns in sets of input strings and is released under the terms of the LGPL license. One important application is searching for motifs in biosequences (DNA or protein). The library provides a unified interface for application code, which can use a standard generalized suffix tree, a modified version of the suffix tree called k-truncated suffix tree, which can be used to search for short (up to k-mer) patterns in multiple input sequences, and several versions of the suffix array, including generalized extended suffix arrays. Each of these data structures has advantages and disadvatanges with respect to memory use, speed, and flexibility. With gsuffix, developers can test each of the algorithms in turn with only very minor modifications to application code.

Documentation

A comprehensive documentation of gsuffix is located at http://gsuffix.sf.net/gsuffix/index.html.

Distribution

The source code of the library can be retrieved from the project's download area at SourceForge. After unpacking the archive, you can compile gsuffix by invoking the usual configure and make combo. Please refer to the included README for more details. The archive also includes a bunch of example applications. They are located in the gsapps folder.

Miscellaneous

The initial release of gsuffix library encompasses algorithms from the following papers:


SourceForge.net Logo
Hosted by SourceForge.net
(see project pages)