Tsaari
Tsaari

Reputation: 103

How to set up a function to sensibly use fastq crate to modify fastq files

I'm learning Rust, and I have a simple program that I hope to use as a learning exercise. My goal here is to get a better idea of the proper way of doing things.

I'm trying to use the fastq crate to do some manipulation of fastq files - these are simple text files that are standard for storing high throughput [DNA/RNA] sequencing data. What I would like to do is modify a subset of the entries, and write the modified entries to a file.

Right now I have an initial proof of concept below that works to modify the first byte of each of the quality values. I want to determine if I'm doing things sensibly or not. My general thought process:

  1. use fastq::RefRecord::to_owned_record() to create a copy of the Record that I'll have ownership of
  2. Create a bytes::BytesMut from the quality field byte slice
  3. Modify the bytes of the BytesMut
  4. Set the quality field in my OwnedRecord from the modified BytesMut
  5. Verify that it's been set

Does the process above and the code below seem sensible? Are there things you'd do differently, and why? Since I'm so new to Rust, I feel that I'm probably looking at things in a naive and suboptimal way. Let me know your general thoughts.

Dependencies:

Here is my example code:

use bytes::BytesMut;
use fastq::{Parser, parse_path, Record, RefRecord};
use std::env::args;

const READS: &str = r#"@read1/ENST00000266263.10;mate1:84-183;mate2:264-363
GACAGCCAGGGGCCAGCGGGTGGCAGTGCCCAGGACATAGAGAGAGGCAGCACACACGCGGTTGATGGTGAAGCCCGGAATGGCCACAGAGGCTAGAGCC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@read2/ENST00000266263.10;mate1:163-262;mate2:283-382
GATGCCATTGACAAAGGCAAGAAGGCTGGAGAGGTGCCCAGCCCTGAAGCAGGCCGCAGCGCCAGGGTGACTGTGGCTGTGGTGGACACCTTTGTATGGC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@read3/ENST00000266263.10;mate1:86-185;mate2:265-364
GGACAGCCAGGGGCCAGCGGGTGGCAGTGCCCAGGACATAGAGAGAGGCAGCANACACACGGTTGATGGTGAAGCCCGGAATGGCCACAGAGGCTAGAGC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII!IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@read4/ENST00000266263.10;mate1:297-396;mate2:401-500
CAGGAGGAGCTGGGCTTCCCCACTGTTAGGTAGAGCTTGCGCAGGCTGGAGTCCAGGAGGAAATCCACCGACCTGTCAATGGGGTGGATAATGATGGGGA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
"#;

fn main() {

    let mut total: usize = 0;

    let parser = Parser::new(READS.as_bytes());
    parser.each(|record| {
        println!("Before mod");
        println!("{}", String::from_utf8_lossy(record.qual()));
        let mut owned_rec = RefRecord::to_owned_record(&record);
        let mut curr_bytes = BytesMut::from(owned_rec.qual());
        curr_bytes[0] = b'$';
        owned_rec.qual = curr_bytes.to_vec();
        println!("After mod");
        println!("{}", String::from_utf8_lossy(owned_rec.qual()));
        total += 1;
        true
    }).expect("Invalid fastq file");
    println!("{}", total);
}

In addition, I'm hoping to move this process into a function, so I'm wondering about the best procedure for doing that. I'm thinking of a function modify_qual that could perform this for me. What would your function signature look like, for something like this? Maybe something that consumes the record and returns the modified record? Something else? I understand the concepts of mutability, passing by reference on the surface level, but don't have a good mental model of when to do certain things.

Thanks in advance for the help!

Upvotes: 1

Views: 138

Answers (1)

ozkanpakdil
ozkanpakdil

Reputation: 4602

About your question "What would your function signature look like" these things changes while developing, so no need to spend too much time on it,

I made 2 approaches code below,

use bytes::BytesMut;
use fastq::{Parser, Record, RefRecord};

const READS: &str = r#"@read1/ENST00000266263.10;mate1:84-183;mate2:264-363
GACAGCCAGGGGCCAGCGGGTGGCAGTGCCCAGGACATAGAGAGAGGCAGCACACACGCGGTTGATGGTGAAGCCCGGAATGGCCACAGAGGCTAGAGCC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@read2/ENST00000266263.10;mate1:163-262;mate2:283-382
GATGCCATTGACAAAGGCAAGAAGGCTGGAGAGGTGCCCAGCCCTGAAGCAGGCCGCAGCGCCAGGGTGACTGTGGCTGTGGTGGACACCTTTGTATGGC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@read3/ENST00000266263.10;mate1:86-185;mate2:265-364
GGACAGCCAGGGGCCAGCGGGTGGCAGTGCCCAGGACATAGAGAGAGGCAGCANACACACGGTTGATGGTGAAGCCCGGAATGGCCACAGAGGCTAGAGC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII!IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@read4/ENST00000266263.10;mate1:297-396;mate2:401-500
CAGGAGGAGCTGGGCTTCCCCACTGTTAGGTAGAGCTTGCGCAGGCTGGAGTCCAGGAGGAAATCCACCGACCTGTCAATGGGGTGGATAATGATGGGGA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
"#;

fn main() {
    let mut total: usize = 0;

    let parser = Parser::new(READS.as_bytes());
    parser
        .each(|record| {
            // modify_qual1(record, &mut total)
            println!("Before mod");
            println!("{}", String::from_utf8_lossy(record.qual()));
            let owned_rec = modify_qual(record);
            println!("After mod");
            println!("{}", String::from_utf8_lossy(owned_rec.qual()));
            total += 1;
            true
        })
        .expect("Invalid fastq file");
    println!("{}", total);
}

fn modify_qual(record: RefRecord) -> fastq::OwnedRecord {
    let mut owned_rec = RefRecord::to_owned_record(&record);
    let mut curr_bytes = BytesMut::from(owned_rec.qual());
    curr_bytes[0] = b'$';
    owned_rec.qual = curr_bytes.to_vec();
    owned_rec
}

fn modify_qual1(record: RefRecord, total: &mut usize) -> bool {
    println!("Before mod");
    println!("{}", String::from_utf8_lossy(record.qual()));
    let mut owned_rec = RefRecord::to_owned_record(&record);
    let mut curr_bytes = BytesMut::from(owned_rec.qual());
    curr_bytes[0] = b'$';
    owned_rec.qual = curr_bytes.to_vec();
    println!("After mod");
    println!("{}", String::from_utf8_lossy(owned_rec.qual()));
    *total += 1;
    true
}

Upvotes: 2

Related Questions