Handling FASTA Files


Functions

int fasta_read (FILE *fh, int(*sequence_function)(char *name, char *sequence, int sequence_len, void *userdata), void *userdata)
 Read and parse a FASTA file.
struct labeled_fastafasta_read_labeled (FILE *fh)
 Parse and read a labeled FASTA file.
void fasta_free_labeled (struct labeled_fasta *fasta)
 Frees all memory associated with the given a labeled fasta instance.
struct plain_fastafasta_read_plain (FILE *fh)
 Parses the FASTA file from the given filehandle and returns the contents.
void fasta_free_plain (struct plain_fasta *fasta)
 Free all memory associated with the given fasta instance.

Detailed Description

One of the main purposes of the gsuffix library is the analysis of biological sequences, such as DNA or protein sequences. Biological sequences are often stored using the FASTA format. The following functions can be used to load such sequences.

Function Documentation

void fasta_free_labeled ( struct labeled_fasta fasta  ) 

Frees all memory associated with the given a labeled fasta instance.

Parameters:
fasta the instance to be freed.

References labeled_sequence::name, labeled_fasta::num, labeled_sequence::seq, and labeled_fasta::seqs.

void fasta_free_plain ( struct plain_fasta fasta  ) 

Free all memory associated with the given fasta instance.

Parameters:
fasta as created by fasta_read_plain().
Examples:
gsdemo.c, measure.c, and testeq.c.

References plain_sequence::name, plain_fasta::num, plain_sequence::seq, and plain_fasta::seqs.

int fasta_read ( FILE *  fh,
int(*)(char *name, char *sequence, int sequence_len, void *userdata)  sequence_function,
void *  userdata 
)

Read and parse a FASTA file.

This is the most general function to parse FASTA files. The format of a FASTA file is:

 >NAME
 SEQUENCE

On every sequence, the given callback function is called with name and sequence parameter corresponding to the NAME and SEQUENCE entries. Note that SEQUENCE may span multiple lines. Implementors of the function may return 0 in case of an error. The error condition is then propagated back to the caller of this function in form of a failure return code.

The prototype of the callback function should look like:

        int sequence_function(char *name, char *sequence, int sequence_len, void *userdata);

Parameters:
fh defines the filehandle as opened by fopen().
sequence_function defines a function to be called for every entry.
userdata passed on to the sequence function.
Returns:
1 on success. 0 on failure.
Examples:
motif.c.

References LINE_LENGTH.

Referenced by fasta_read_labeled(), and fasta_read_plain().

struct labeled_fasta* fasta_read_labeled ( FILE *  fh  )  [read]

Parse and read a labeled FASTA file.

The contents should look like:

 >nameA 1.0
 acgtacgt
 >nameB -1.0
 tgcatgca  etc.

Parameters:
fh 
Returns:
1 on success. 0 on failure.

References labeled_fasta::allocated, fasta_read(), labeled_fasta::num, and labeled_fasta::seqs.

struct plain_fasta* fasta_read_plain ( FILE *  fh  )  [read]

Parses the FASTA file from the given filehandle and returns the contents.

Basically, this is a convenience function that calls fasta_read() with a special callback that collects all sequences.

Parameters:
fh the filehandle as returned by fopen().
Returns:
a data structure that needs to be freed via fasta_free_plain() or NULL.
Examples:
gsdemo.c, measure.c, and testeq.c.

References plain_fasta::allocated, fasta_read(), plain_fasta::num, and plain_fasta::seqs.


Generated on Wed May 27 16:40:40 2009 for gsuffix-1.0.0 by  doxygen 1.5.9