jerin
jerin

Reputation: 653

Unexpected copy in unpacking via structured bindings a returned tuple

I'm trying to use structured bindings to return an std::tuple<std::string, std::vector<std::string_view>>, which represents a string and string-views pointing to the said string.

Attempting to do this as follows:

#include <iostream>
#include <string>
#include <string_view>
#include <tuple>
#include <vector>

using Views = std::vector<std::string_view>;

std::tuple<std::string, Views> create(std::string sample) {
  std::string surface = sample;
  Views views;
  for (size_t i = 0; i < sample.size(); i++) {
    std::string_view view(surface.data() + i, 1);
    views.push_back(view);
  }
  return std::make_tuple(std::move(surface), std::move(views));
}

template <class T> void emit(const T &surface) {
  std::cout << "[" << reinterpret_cast<const void *>(surface.data()) << "..."
            << reinterpret_cast<const void *>(surface.data() + surface.size())
            << "]";
}

int main() {
  std::string line;
  while (getline(std::cin, line)) {
    auto [surface, views] = create(line);
    std::cout << "Surface: ";
    emit(surface);
    std::cout << "\n";
    std::cout << "Views: ";
    for (auto view : views) {
      emit(view);
      std::cout << " ";
    }
  }
  std::cout << "\n";
  return 0;
}

This gets me the following output:

$ ./a.out <<< "12"
Surface: [0x7ffdc689aa68...0x7ffdc689aa6a]
Views: [0x7ffdc689a980...0x7ffdc689a981] [0x7ffdc689a981...0x7ffdc689a982] 

Upon close inspection, it is evident that the string address and the addresses pointed to by the views are different, i.e the views point to an older string, and the string appears to have been copied while the older one destroyed.

I figure I must be doing something wrong here. Why is this wrong? What is the correct way to achieve the intent using structured bindings?

Upvotes: 1

Views: 120

Answers (2)

Guillaume Racicot
Guillaume Racicot

Reputation: 41840

Your problem lies in create.

Specifically, you create a view over a string:

std::string_view view(surface.data() + i, 1);

But that string is a local variable in your function:

std::string surface = sample;

That string is moved from in the return statement, leaving that local string empty, and then that local string dies, leading to UB. This is because the string might contain the buffer on the stack due to SBO.

std::tuple<std::string, Views> create(std::string sample) {
  std::string surface = sample; // local string
  Views views;
  for (size_t i = 0; i < sample.size(); i++) {
    std::string_view view(surface.data() + i, 1); // view over the local string
    views.push_back(view);
  }

  // moves surface, leaving surface empty.
  // The string views have a size but may point to an empty string because SBO.
  // It is already UB reading from such string views if SBO is used
  return std::make_tuple(std::move(surface), std::move(views));
} // surface destroyed, all pointer to that string are invalidated.

The move constructor of string can require to copy the string data into another buffer in the case of SBO, so you cannot assume a string view is still valid after moving the string.

Upvotes: 2

Ahmed AEK
Ahmed AEK

Reputation: 18329

the culprit here is short string optimization.

when the input is very small, the string buffer is allocated on the stack, not on the heap, so your views are pointing to a buffer on the stack ... this has to be copied and cannot be moved.

if the input string was much taller, then the buffer would be allocated on the heap, and all surface versions will be pointing to the same location.

a solution here is to use a vector<char> which is guaranteed to be stored on the heap, and you can still get a string_view into it.

std::tuple<std::vector<char>, Views> create(const std::string& sample) {
  std::vector<char> surface(sample.begin(),sample.end());
  // no other code changes
}

Upvotes: 2

Related Questions