Reputation: 17276

How to use socket recv in combination with std::vector without the cost of resize in C++23?

There is a method to read data from a socket and store the received data into a std::vector.

For convenience, here is a copy of the code.

const size_t recv_buffer_length = 128;

std::vector<char> recv_buffer;
recv_buffer.resize(recv_buffer_length);

const auto byte_count = recv(peer_fd, recv_buffer.data(), recv_buffer_length, 0);
recv_buffer.resize(byte_count);

// does not print anything. `.size() > 0`, but each element is `\0`
for(const auto i: std::views::iota(0, byte_count))
{
    std::print("{}", recv_buffer[i]);
}

The solution is inefficient. The reason is that resize uses a default value to initialize the elements of the std::vector.

When I initially saw this question I thought it was performing a reserve operation to avoid this additional overhead. However I was mistaken.

There is no need to fill the buffer with a default value in this case. recv returns the number of bytes recieved, so the vector simply needs to be resized to this size after recv is called to fill it with data.

The problem is, if the size is increased, the data will become corrupted.

Is there a solution to this? Preferably one which avoids having to revert to having to track a pointer and a length variable.

To explain in greater detail:

The program runs in a loop, calling recv to populate a vector
The first time the program runs, reserve is called, followed by resize. Both functions take some value which is the maximum size of the buffer used by recv. resize is called with a value of zero to ensure the program is in a sensible state.
The program enters the recv loop.
recv is called, returning 200 bytes of data. This is smaller than the vector capacity, and 200 bytes are written into the buffer. resize is called with an argument of 200. The entire of the first 200 bytes is overwritten with zeros.

The solution appears to be:

Don't call resize with an argument of zero.

However, since the program runs in a loop, this will only work for the first iteration. After that the buffer has been resized to some smaller value, and at some point a future call will need to call resize with a larger value.

Upvotes: -2

Answers (2)

463035818_is_not_an_ai

Reputation: 123114

You need to provide a buffer with recv_buffer_length elements for recv to write to. There is no way around that and a std::vector<char> resized to 0 will not work (also not with reserve). If you want to resuse the same buffer, there is no point in resizing a vector in each iteration. The code you posted does recv_buffer.resize(byte_count); for more convenient handling of the data, you don't have to keep track of byte_count seperate from the container. std::vector cannot do both out of the box. Though you can wrap a fixed sized buffer to encapsulate the number of read elements:

struct my_buffer {
     std::array<char,recv_buffer_length> data;
     size_t size = 0;
     auto begin() { return data.begin(); }
     auto end() { return data.begin()+size; }
     void read(auto peer_fd) {
        size = recv(peer_fd, *begin(), recv_buffer_length, 0);
     }
};

Upvotes: 2

user2138149

Reputation: 17276

To be honest, std::vector is the wrong tool for the job here.

There is no way to re-use the same std::vector object multiple times while avoiding the overhead of resize writing default initialized values (zeros) into the buffer.

You can see this is a problem from the following snippet

std::vector buffer;
buffer.reserve(1024);
buffer.resize(1024);

for(;;) {
    buffer_size = recv(peer_fd, buffer.data(), buffer.capacity(), 0);

    buffer.resize(buffer_size); // on average, will write O(N) bytes of data
    // (depends a bit on the statistics of value of `buffer_size`)
}

My suggestion is to use std::unique_ptr<uint8_t[]> and carry around the additional variables for size and capacity where necessary.

To give a bit of further detail, if the last size is buffer_size_last and the current loop value is buffer_size

when buffer_size_last >= buffer_size, no additional bytes are written
however when buffer_size_last < buffer_size then buffer_size_last - buffer_size bytes of zeros are written
if the statistics of buffer_size are highly variable, then typically N/2 bytes will be written each loop in the worst case

However, it is worth noting the following. As mentioned by MichaelRoy, the typical time taken to fill the buffer with zeros will probably be smaller than the time waiting for a recv. So for many applications, this overhead is probably not significant.

Upvotes: 1

How to use socket recv in combination with std::vector without the cost of resize in C++23?

Answers (2)

Related Questions