ketan
ketan

Reputation: 2904

How to read and process a pipe delimited file in Rust?

I want to read a pipe delimited file, process the data, and generate a result in CSV format.

Input file data

A|1|Pass
B|2|Fail
A|3|Fail
C|6|Pass
A|8|Pass
B|10|Fail
C|25|Pass
A|12|Fail
C|26|Pass
C|26|Fail

I'm want to apply a group by function on column 1 and column 3 and generate column 2's sum according to a particular group.

I'm stuck at the point of how to maintain the records to apply the group by function:

use std::fs::File;
use std::io::{BufReader};
use std::io::{BufRead};
use std::collections::HashMap;

fn say_hello(id: &str, value: i32, no_change : i32) {

    if no_change == 101 {
        let mut data = HashMap::new();
    }
    if value == 0 {
        if data.contains_key(id) {
            for (key, value) in &data {
                if value.is_empty() {

                }
            }
        } else {
            data.insert(id,"");
        }
    } else if value == 2 {
        if data.contains_key(id) {
            for (key, value) in &data {
                if value.is_empty() {

                } else {

                }
            }
        } else {
            data.insert(id,"");
        }
    }
}

fn main() {

    let f = File::open("sample2.txt").expect("Unable to open file");
    let br = BufReader::new(f);
    let mut no_change = 101;
    for line in br.lines() {
        let mut index = 0;
        for value in line.unwrap().split('|') {
            say_hello(&value,index,no_change);
            index = index + 1;
        }
    }
}

I'm expecting a result like:

name,result,num
A,Fail,15
A,Pass,9
B,Fail,12
C,Fail,26
C,Pass,57

Is there any specific technique to read a pipe-delimited file and process the data like above? Python's pandas accomplished this requirement but I want to do it in Rust.

Upvotes: 0

Views: 2280

Answers (2)

Shepmaster
Shepmaster

Reputation: 431589

As was mentioned, use the csv crate to do the heavy lifting of parsing the file. Then it's just a matter of grouping each row by using a BTreeMap which also helpfully performs sorting. The entry API helps efficiently insert into the BTreeMap.

extern crate csv;
extern crate rustc_serialize;

use std::fs::File;
use std::collections::BTreeMap;

#[derive(Debug, RustcDecodable)]
struct Record {
    name: String,
    value: i32,
    passed: String,
}

fn main() {
    let file = File::open("input").expect("Couldn't open input");
    let mut csv_file = csv::Reader::from_reader(file).delimiter(b'|').has_headers(false);

    let mut sums = BTreeMap::new();
    for record in csv_file.decode() {
        let record: Record = record.expect("Could not parse input file");
        let key = (record.name, record.passed);
        *sums.entry(key).or_insert(0) += record.value;
    }

    println!("name,result,num");
    for ((name, passed), sum) in sums {
        println!("{},{},{}", name, passed, sum);
    }
}

You'll note that the output is correct:

name,result,num
A,Fail,15
A,Pass,9
B,Fail,12
C,Fail,26
C,Pass,57

Upvotes: 2

swizard
swizard

Reputation: 2701

I'd suggest something like this:

use std::str;
use std::collections::HashMap;
use std::io::{BufReader, BufRead, Cursor};

fn main() {
    let data = "
A|1|Pass
B|2|Fail
A|3|Fail
C|6|Pass
A|8|Pass
B|10|Fail
C|25|Pass
A|12|Fail
C|26|Pass
C|26|Fail";
    let lines = BufReader::new(Cursor::new(data))
        .lines()
        .flat_map(Result::ok)
        .flat_map(parse_line);
    for ((fa, fb), s) in group(lines) {
        println!("{}|{}|{}", fa, fb, s);
    }
}

type ParsedLine = ((String, String), usize);

fn parse_line(line: String) -> Option<ParsedLine> {
    let mut fields = line
        .split('|')
        .map(str::trim);
    if let (Some(fa), Some(fb), Some(fc)) = (fields.next(), fields.next(), fields.next()) {
        fb.parse()
            .ok()
            .map(|v| ((fa.to_string(), fc.to_string()), v))
    } else {
        None
    }
}

fn group<I>(input: I) -> Vec<ParsedLine> where I: Iterator<Item = ParsedLine> {
    let mut table = HashMap::new();
    for (k, v) in input {
        let mut sum = table.entry(k).or_insert(0);
        *sum += v;
    }
    let mut output: Vec<_> = table
        .into_iter()
        .collect();
    output.sort_by(|a, b| a.0.cmp(&b.0));
    output
}

playground link

Here a HashMap is used for grouping entries and then results are moved to a Vec for sorting.

Upvotes: 1

Related Questions