recipriversexclusion
recipriversexclusion

Reputation: 15366

Two short questions about std::vector

  1. When a vector is created it has a default allocation size (probably this is not the right term to use, maybe step size?). When the number of elements reaches this size, the vector is resized. Is this size compiler specific? Can I control it? Is this a good idea?
  2. Do repeated calls to vector::size() recount the number of elements (O(n) calculation) or is this value stored somewhere (O(1) lookup). For example, in the code below

    // Split given string on whitespace
    vector<string> split( const string& s )
    {
        vector<string> tokens;
        string::size_type i, j;
        i = 0;
        while ( i != s.size() ) {
            // ignore leading blanks
            while ( isspace(s[i]) && i != s.size() ) {
                i++;
            }
            // found a word, now find its end
            j = i;
            while ( !isspace(s[j]) && j != s.size() ) {
                j++;
            }
            // if we found a word, add it to the vector
            if ( i != j ) { 
                tokens.push_back( s.substr(i, j-i) );
                i = j;
            }
        }
        return tokens;
    }
    

assuming s can be very large, should I call s.size() only once and store the result?

Thanks!

Upvotes: 3

Views: 378

Answers (6)

greyfade
greyfade

Reputation: 25647

When the number of elements reaches this size, the vector is resized. Is this size compiler specific? Can I control it? Is this a good idea?

In general, this is a library-specific behavior, but you may be able to influence this behavior by specifying a custom allocator, which is non-trivial work.

Do repeated calls to vector::size() recount the number of elements (O(n) calculation) or is this value stored somewhere (O(1) lookup).

Most implementations store the size as a member. It's a single memory read.

Upvotes: 1

tzaman
tzaman

Reputation: 47770

Unrelated to your actual questions, but here's a more "STL" way of doing what you're doing:

vector<string> split(const string& s)
{
    istringstream stream(s);
    istream_iterator<string> iter(stream), eos;
    vector<string> tokens;
    copy(iter, eos, back_inserter(tokens));
    return tokens;
}

Upvotes: 1

Jerry Coffin
Jerry Coffin

Reputation: 490048

In most cases, you should leave the allocation alone unless you know the number of items ahead of time, so you can reserve the correct amount of space.

At least in every case of which I'm aware, std::vector::size() just returns a stored value, so it has constant complexity. In theory, the C++ standard allows it to do otherwise. There are reasons to allow otherwise for some other containers, primarily std::list, and rather than make a special case for those, they simply recommend constant time for all containers instead of requiring it for any. I can't quite imagine a vector::size that counted elements though -- I'm pretty no such thing has ever existed.

P.S., an easier way to do what your code above does, is something like this:

std::vector<string> split(std::string const &input) {
    vector<string> ret;
    istringstream buffer(input);

    copy(istream_iterator<string>(input),
         istream_iterator<string>(),
         back_inserter(ret));

    return ret;
}

Edit: IMO, The C++ Standard Library, by Nicolai Josuttis is an excellent reference on such things.

Upvotes: 5

James McNellis
James McNellis

Reputation: 354999

The actual size of the capacity increment is implementation-dependent, but it has to be (roughly) exponential to support the container's complexity requirements. As an example, the Visual C++ standard library will allocate exactly the space required for the first few elements (five, if I recall correctly), then increases the size exponentially after that.

The size has to be stored somehow in the vector, otherwise it doesn't know where the end of the sequence is! However, it may not necessarily be stored as an integer. The Visual C++ implementation (again, as an example) stores three pointers:

  1. a pointer to the beginning of the underlying array,
  2. a pointer to the current end of the sequence, and
  3. a pointer to the end of the underlying array.

The size can be computed from (1) and (2); the capacity can be computed from (1) and (3).

Other implementations might store the information differently.

Upvotes: 4

casablanca
casablanca

Reputation: 70701

  1. The resizing mechanism is usually fixed. (Most compilers double the size of the vector when it reaches the limit.) The C++ standard specifies no way to control this behaviour.

  2. The size is internally updated whenever you insert/remove elements and when you call size(), it's returned immediately. So yes, it's O(1).

Upvotes: 1

Ned Batchelder
Ned Batchelder

Reputation: 375484

  1. It's library-specific. You might be able to control the incremental allocation, but you might not.

  2. The size is stored, so it is very fast (constant time) to retrieve. How else could it work? C has no way of knowing in general whether a memory location is "real data" or not.

Upvotes: 1

Related Questions