mbrt
mbrt

Reputation: 2178

How do I write a function that takes both owned and non-owned string collections?

I'm having trouble writing a function that takes a collection of strings as parameter. My function looks like this:

type StrList<'a> = Vec<&'a str>;

fn my_func(list: &StrList) {
    for s in list {
        println!("{}", s);
    }
}

All goes well if I pass a Vec<&'a str> to the function, as expected. However, if I pass a Vec<String> the compiler complains:

error[E0308]: mismatched types
  --> src/main.rs:13:13
   |
13 |     my_func(&v2);
   |             ^^^ expected &str, found struct `std::string::String`
   |
   = note: expected type `&std::vec::Vec<&str>`
   = note:    found type `&std::vec::Vec<std::string::String>`

This is the main used:

fn main() {
    let v1 = vec!["a", "b"];
    let v2 = vec!["a".to_owned(), "b".to_owned()];
    my_func(&v1);
    my_func(&v2);
}

My function is not able to take vectors of owned strings. Conversely, if I change the StrList type into:

type StrList = Vec<String>;

The first call fails, and the second works.

A possible solution is to produce a Vec<&'a str> from v2 in this way:

let v2_1 : Vec<_> = v2.iter().map(|s| s.as_ref()).collect();

But it seems very odd to me. my_func should not care about the ownership of the strings.

What kind of signature should I use for my_func to support both vectors of owned strings and string references?

Upvotes: 18

Views: 2381

Answers (2)

user395760
user395760

Reputation:

Although String and &str are very closely related, they are not identical. Here's what your vectors look like in memory:

v1 ---> [ { 0x7890, // pointer to "a" + 7 unused bytes
            1 }     // length of "a"
          { 0x7898, // pointer to "b" + 7 unused bytes
            1 } ]   // length

v2 ---> [ { 0x1230 // pointer to "a" + 7 unused bytes (a different copy)
            8      // capacity
            1 }    // length
          { 0x1238 // pointer ...
            8      // capacity
            1 } ]  // length

Here each line is the same amount of memory (four or eight bytes depending on pointer size). You can't take the memory of one of these and treat it like the other. The memory layout doesn't match up. The items are of different sized and have different layout. For example, if v1 stores its items starting at address X and v2 stores its items starting at address Y, then v1[1] is at address X + 8 but v2[1] is at address Y + 12.

What you can do is write a generic function like this:

fn my_func<T: AsRef<str>>(list: &[T]) {
    for s in list {
        println!("{}", s.as_ref());
    }
}

Then the compiler can generate appropriate code for both &[String] and &[&str] as well as other types if they implement AsRef<str>.

Upvotes: 28

Shepmaster
Shepmaster

Reputation: 431809

To build on delnan's great answer, I want to point out one more level of generics that you can add here. You said:

a collection of strings

But there are more types of collections than slices and vectors! In your example, you care about forward-only, one-at-a-time access to the items. This is a perfect example of an Iterator. Below, I've changed your function to accept any type that can be transformed into an iterator. You can then pass many more types of things. I've used a HashSet as an example, but note that you can also pass in v1 and v2 instead of &v1 or &v2, consuming them.

use std::collections::HashSet;

fn my_func<I>(list: I)
    where I: IntoIterator,
          I::Item: AsRef<str>,
{
    for s in list {
        println!("{}", s.as_ref());
    }
}

fn main() {
    let v1 = vec!["a", "b"];
    let v2 = vec!["a".to_owned(), "b".to_owned()];
    let v3 = {
        let mut set = HashSet::new();
        set.insert("a");
        set.insert("b");
        set.insert("a");
        set
    };
    let v4 = {
        let mut set = HashSet::new();
        set.insert("a".to_owned());
        set.insert("b".to_owned());
        set.insert("a".to_owned());
        set
    };

    my_func(&v1);
    my_func(v1);
    my_func(&v2);
    my_func(v2);
    my_func(&v3);
    my_func(v3);
    my_func(&v4);
    my_func(v4);
}

Upvotes: 5

Related Questions