FASTA File Documentation


Overview

Feature Value
File Extension .fasta, .fa, .mpfa, .fna, .fsa
MIME Type text/x-fasta
Primary Usage Storing Biological Sequences
Type of Data Nucleotide Sequences, Amino Acid Sequences
Header Line Indicator > symbol preceding description
Sequence Data Nucleotides (A, C, G, T/U) or Amino Acids (single-letter codes)
Character Encoding ASCII
Line Width in Files Typically 60-80 characters (not strictly enforced)
File Creation Software Various bioinformatics tools (e.g., BLAST, Clustal)
Support for Multiple Sequences Yes
Usage in Databases NCBI, EMBL, DDBJ
Compression Often gzipped (.fasta.gz)
Comment Lines Start with ; (less common)
Blank Lines Generally ignored/not recommended
Special Characters N (ambiguous nucleotide), X (ambiguous amino acid)
Case Sensitivity Upper and lower case letters are accepted (meaning can vary)
Modifications and Annotations Limited; use other formats (e.g., GenBank) for detailed annotations
File Concatenation Simple due to format structure (concatenate with care)
Space within Sequences Spaces are not allowed in the sequence data
Popularity Widely used in bioinformatics and computational biology
Origins Developed in the 1980s for the FASTA sequence alignment software