Create A nucleotide sequence

String

There are many different string types in Rust, but the two most common ones are String and &str. Both can be used to store nucleotide sequences, but they have different characteristics. Usually, use String if you intend to mutate the sequence, otherwise use &str. For more information, visit the rust docs for String and &str respectively.

fn main() {
    let nt_string: String = "ACGT".to_string();
    let nt_string: &str = "ACGT";
}

Byte slice

Usually when reading nucleotide sequences from a FASTA/Q file, we get it as a byte slice, &[u8], which is a more convenient format.

fn main() {
    let nt_string: &[u8] = b"ACGT";

    println!("{:?}", nt_string);
}

Run the code and examine the output. We get a bunch of numbers. This is the ASCII representation of our nucleotides, where A/T/C/G corresponds to an 8-bit representation. For more information, visit this link.

We can check that the following representations are equivalent:

fn main() {
    assert_eq!(b'A', 65);
    assert_eq!(b'C', 67);
    assert_eq!(b'G', 71);
    assert_eq!(b'T', 84);
}

Binary

Will be covered in a later section. In short, using 8-bits is overkill for representing only four nucleotides. Instead, we can map A/C/G/T to the corresponding binary representation:

  • A => 00
  • C => 01
  • G => 10
  • T => 11