Create A nucleotide sequence
String
There are many different string types in Rust, but the two most common ones are String and &str. Both can be used to store nucleotide sequences, but they have different characteristics. Usually, use String if you intend to mutate the sequence, otherwise use &str. For more information, visit the rust docs for String and &str respectively.
fn main() { let nt_string: String = "ACGT".to_string(); let nt_string: &str = "ACGT"; }
Byte slice
Usually when reading nucleotide sequences from a FASTA/Q file, we get it as a byte slice, &[u8], which is a more convenient format.
fn main() { let nt_string: &[u8] = b"ACGT"; println!("{:?}", nt_string); }
Run the code and examine the output. We get a bunch of numbers. This is the ASCII representation of our nucleotides, where A/T/C/G corresponds to an 8-bit representation. For more information, visit this link.
We can check that the following representations are equivalent:
fn main() { assert_eq!(b'A', 65); assert_eq!(b'C', 67); assert_eq!(b'G', 71); assert_eq!(b'T', 84); }
Binary
Will be covered in a later section. In short, using 8-bits is overkill for representing only four nucleotides. Instead, we can map A/C/G/T to the corresponding binary representation:
A=>00C=>01G=>10T=>11