Fasta Parser
Problem Description
Build a CLI tool fasta_stats
that calculates the number of sequences for a provided fasta file and writes the results to a json file.
- Input argument:
--fasta <file.fasta>
- Output argument:
--outfile <outfile.json>
Expected command:
fasta_stats --fasta <file.fasta> --outfile <outfile.json>
Expected contents of output file:
#![allow(unused)] fn main() { { num_sequences: usize, } }
Checklist
- Define a struct for CLI arguments with clap.
-
Define a struct
FastaStats
for storing calculated stats. -
Define a function
get_fasta_stats
that takesfasta: PathBuf
andoutfile: PathBuf
as arguments.-
Iterate over fasta records and keep track of
num_sequences
. -
Store
num_sequences
in an instance ofFastaStats
-
Iterate over fasta records and keep track of
-
Define a function
write_json
that writes theFastaStats
instance tooutfile
.
Suggested Rust Crates
- Clap - argument parsing.
- Needletail - reading fasta files.
- Serde - serializing data.
- Serde json - serializing json.
Code Examples
Extra Credits
-
Only allow input files with extensions
.fasta
,.fa
,.fsa
,.fna
. - Add more stats: average sequence length, average gc content and total number of bases.
- Graceful error handling.