Reputation: 195
I often perform the same analysis on log files. Initially, I had a small Awk script used with grep and sort. For fun, I rewrote it to Python:
#!/usr/bin/python3
import sys
months = { "Jan": 1, "Feb": 2, "Mar": 3, "Apr": 4, "May": 5, "Jun": 6,
"Jul": 7, "Aug": 8, "Sep": 9, "Oct": 10, "Nov": 11, "Dec": 12 }
months_r = { v:k for k,v in months.items() }
totals = {}
for line in sys.stdin:
if "redis" in line and "Partial" in line:
f1, f2 = line.split()[:2]
w = (months[f1], int(f2))
totals[w] = totals.get(w, 0) + 1
for k in sorted(totals.keys()):
print(months_r[k[0]], k[1], totals[k])
and then to Go (to avoid being too long, I will not quote the Go version here (68 lines)).
I am now trying to express it in Rust (I had only written some toy examples so far) and I am quite stuck. At first, I had tons of errors, and it is getting a bit better, but now I have one that I can not fix...
Could someone give a hand on how to express this in Rust? The Python version is quite short, so it would be nice to have something quite idiomatic and not too verbose.
Here is where I am so far, but I can not make any further progress.
use std::array::IntoIter;
use std::collections::HashMap;
use std::io;
use std::io::prelude::*;
use std::iter::FromIterator;
fn main() {
let m = HashMap::<_, _>::from_iter(IntoIter::new([
("Jan", 1),
("Feb", 2),
("Mar", 3),
("Apr", 4),
("May", 5),
("Jun", 6),
("Jul", 7),
("Aug", 8),
("Sep", 9),
("Oct", 10),
("Nov", 11),
("Dec", 12),
]));
let mut totals = HashMap::new();
for l in io::stdin().lock().lines() {
let ul = l.unwrap();
if ul.contains("redis") && ul.contains("Partial") {
let mut wi = ul.split_whitespace();
let f1 = wi.next().unwrap();
let f2 = wi.next().unwrap();
let count = totals.entry((m.get(&f1).unwrap(), f2)).or_insert(0);
*count += 1;
}
}
}
Simple hints would be much appreciated, I am not asking for a full working solution, which is more work (but I will welcome one if it comes, of course).
Thanks a lot!
Upvotes: 0
Views: 171
Reputation: 154836
The problem is that your f2
refers to data owned by the current line string ul
. However, ul
is dropped at each iteration of the loop and a new one is allocated by the lines()
iterator. If you were to insert a slice referring to ul
into the totals
hashmap, the slice would get invalidated in the next iteration of the loop and the program would crash or malfunction when you later tried to access the deallocated data.
split_whitespace()
behaves like that for efficiency: when calling it, you often only need to inspect the strings, and it would be a waste to return freshly allocated copies of everything. Instead, ul.split_whitespace()
gives out cheap slices which are effectively views into ul
, allowing you to choose whether or not to copy the returned string slices.
The solution is simple, just create an owned string from the returned slice using to_string()
. For example, this compiles:
fn main() {
let m = HashMap::<_, _>::from_iter(IntoIter::new([
("Jan", 1),
// ...
]));
let mut totals = HashMap::new();
for l in io::stdin().lock().lines() {
let ul = l.unwrap();
if ul.contains("redis") && ul.contains("Partial") {
let mut wi = ul.split_whitespace();
let f1 = wi.next().unwrap();
let f2 = wi.next().unwrap().to_string();
*totals.entry((m.get(f1).unwrap(), f2)).or_insert(0) += 1;
}
}
}
An even simpler option is to do what your Python code does and the Rust translation doesn't, which is parse f2
as integer. The integer can be copied by value into the map and you're no longer required to allocate a copy of f2
:
for l in io::stdin().lock().lines() {
let ul = l.unwrap();
if ul.contains("redis") && ul.contains("Partial") {
let mut wi = ul.split_whitespace();
let f1 = wi.next().unwrap();
let f2 = wi.next().unwrap();
let w = (m.get(f1).unwrap(), f2.parse::<u32>().unwrap());
*totals.entry(w).or_insert(0) += 1;
}
}
Finally, the sorting and printing should be achieved with a reasonably straightforward translation of the original Python:
let months_r: HashMap<_, _> = m.iter().map(|(k, v)| (v, k)).collect();
let mut totals_keys: Vec<_> = totals.keys().collect();
totals_keys.sort();
for k in totals_keys {
println!(
"{} {} {}", months_r.get(k.0).unwrap(),
k.1, totals.get(k).unwrap()
);
}
All in all, 48 lines with the stock rustfmt
- longer than Python, but still shorter than Go.
Upvotes: 1