Reputation: 179
I inspecting new in c++20 for features and here is std::string_view class.
The problem is that I want to create non parent 'locking' viewable area and string_view obj wants non modificable string presented for all livetime, confused:
...
auto sub[] = (const std::string &s) -> std::string_view { return std::move( std::string_view(s).substr(6,5) )};
...
string s("Hello world");
auto f = sub( s );
// I'm suspect that string_view should attaches to indices of original string
std::cout << f; // OK
// lets modify original expecting our string_view f will move it's begin and end to 5 symbols front
s.insert(0, "shift");
std::cout << f; // Fail: corrupted memory
I understand that if I modify viewable part, or s will be cleared or deleted etc it's ok. but Why if modification occurs like: inserting/erasing in "s" it will invalidate my viewable part "f"?
Is there any other classes/adaptors c++20 may give me to get this done as I see it?
Upvotes: 0
Views: 321
Reputation: 61920
Relative to the cheap cost of std::string_view
as specified, this would add a lot of overhead for something most people don't need. Calling things that resize a string invalidate all pointers, references, and iterators to that string because it could need to reallocate and move the data to a new location. You can get around this by using indices rather than pointers, but it's not free, and as mentioned, not even cheap relative to the base cost of this lightweight abstraction.
To demonstrate what I mean, consider this minimized version of what such an implementation could look like (live example):
class stable_string_view {
const char*(*get_data_start)(const void*);
// Highly recommended for sanity: std::size_t(*get_data_size)(const void*);
const void* data_source;
std::size_t first, last;
public:
stable_string_view(const std::string& str) noexcept
: get_data_start{[](const void* source) { return static_cast<const std::string*>(source)->data(); }},
data_source{&str},
first{0}, last{str.size()} {}
stable_string_view(const char* cstr) noexcept
: get_data_start{[](const void* source) { return static_cast<const char*>(source); }},
data_source(cstr),
first{0}, last{std::strlen(cstr)} {}
auto size() const noexcept -> std::size_t {
return last - first;
}
auto operator[](std::size_t index) const -> const char& {
return get_data_start(data_source)[first + index];
}
auto substr(std::size_t pos = 0, std::size_t count = -1) const -> stable_string_view {
if (pos > size()) {
// Removed: This piece of code would distract from the basic answer.
}
auto rcount = std::min(count, size() - pos);
auto copy = *this;
copy.first += pos;
copy.last = copy.first + rcount;
return copy;
}
void output() const {
auto data = get_data_start(data_source);
for (auto i = first; i < last; ++i) {
std::putchar(data[i]);
}
std::putchar('\n');
}
};
The first thing that should jump out at you is this:
const char*(*get_data_start)(const void*);
What is this exactly? It's roughly the minimum we need in order to be able to index the original data. Calling this goes and acquires a fresh pointer. Realistically, we've just increased the size of every single object by 50%. That means 50% more to copy around and less room in the cache if you have a lot of these floating around (e.g., a parser storing views to the original file text). In this implementation, it's 100% because there's both a function pointer and an opaque pointer.
We can always shove multiple things behind a pointer to reduce size, but that's another runtime cost for sure. At this point, the lightweight abstraction isn't so lightweight anymore. And that's without even having any sanity checking (which you really, really should have if you're going to be changing the size of the data source while this thing is still around), but that requires another size and/or performance hit.
Now surely this is only a size hit, right? Wrong. I've compiled a basic comparison between std::string_view
and this one:
char with_stable(stable_string_view view) {
return view.substr(5, 8)[3];
}
char with_standard(std::string_view view) {
return view.substr(5, 8)[3];
}
with_stable(stable_string_view): # @with_stable(stable_string_view) push rbx mov rbx, qword ptr [rsp + 32] mov rdx, qword ptr [rsp + 40] sub rdx, rbx cmp rdx, 4 jbe .LBB0_2 lea rax, [rsp + 16] mov rdi, qword ptr [rax + 8] call qword ptr [rax] mov al, byte ptr [rbx + rax + 8] pop rbx ret .LBB0_2: mov edi, offset .L.str.2 mov esi, 5 xor eax, eax call std::__throw_out_of_range_fmt(char const*, ...)
with_standard(std::basic_string_view >): # @with_standard(std::basic_string_view >) push rax cmp rdi, 4 jbe .LBB1_2 mov al, byte ptr [rsi + 8] pop rcx ret .LBB1_2: mov rcx, rdi mov edi, offset .L.str.3 mov esi, offset .L.str.2 mov edx, 5 xor eax, eax call std::__throw_out_of_range_fmt(char const*, ...)
Now some of this is exception handling, which I tried to make as close as I could to the with_standard
code for an easier comparison. The line that's actually important is:
call qword ptr [rax]
This is the other performance price of that function pointer in the class. The compiler can't always see through it, so something as simple as getting the start of the data can be much more expensive than it should be. You could cut down on this a bit by storing a const std::string*
instead, but that would defeat the purpose of a string_view
that works with any contiguous range of characters. Maybe it will suit your needs, though.
In conclusion, it's entirely possible to maintain a generic string view that remains valid through iterator invalidation. However, doing so incurs more space and/or runtime overhead for something that is supposed to be a vocabulary type in APIs. Giving people a good reason to avoid using vocabulary types when they don't need an extra guarantee doesn't seem like a good idea. There could be a sister class that does this, but I haven't seen demand for one. That would be something to sort out in preparation for writing a proposal.
Upvotes: 4
Reputation: 238361
I inspecting new in c++20 for features and here is std::string_view class.
Note that string_view was introduced back in c++17.
Why if modification occurs like: inserting/erasing in "s" it will invalidate my viewable part "f"?
Because that is the way std::string is specified. Certain operations invalidate all references to the string. A string view is a reference to the string. Increasing the size of the string beyond its capacity is such operation.
Is there any other classes/adaptors c++20 may give me to get this done as I see it?
If you want a substring that is not invalidated when the original string is invalidated, then what you can use to store the substring is std::string.
Upvotes: 1