Nico Engels
Nico Engels

Reputation: 447

How to use span to wrap up command line args

Is this use of the upcoming C++20 std::span correct and with no overhead to wrap up the command line arguments?

#include <iostream>
#include <span>

int main(int argc, const char* argv[])
{
    for (auto s : std::span { argv, static_cast<std::size_t>(argc) })
      std::cout << s << std::endl;
}

If it is correct, could I go futher and use with std::string_view?

Upvotes: 2

Views: 2164

Answers (2)

Alex
Alex

Reputation: 21

I've run into this kind of situation before where I really wanted to play nice with the standard library and use std::string_view over the const char *'s and really didn't want to allocate anything to start. Just because thinking about it there was no reason it had to.

The awkward part of course is that we're provided a variable amount of arguments, and the arguments don't provide a size. We need a way to a) know the size of every argument or b) have the space already to calculate it once. So we're stuck in what seems to be a limbo, because we'd likely need to allocate to handle b) (unless we have known restriction on how many arguments our program should handle), or somehow magically know what the size of an argument is.

So here's an example of my dirty trick:

    for (size_t i = 0; (i+1) < argc; i++) {
        std::cout << std::string_view{argv[i], argv[i + 1] - 1} << '\n';
    }

Why does this work? Well the signature wouldn't necessarily tell you this, but it's a fair chance that any operating system's method of handling passing command line parameters between programs is going to be dead simple (performance reasons etc...). How dead simple? Well the argv[] list provided tends to be an array that points into a contiguous memory block. This is because you can essentially parse command line arguments in the terminal in constant time and with known space constraints (because you need to have made a buffer long enough to handle whatever you typed), so it's trivial to put the results into a contiguous memory block that holds at least that much and then do a bit of work. The benefit is that passing the same command line arguments or modifications can be as simple as appending or truncating that block. So the implementation tends to be that the buffer from the terminal is stitched together, then appropriately split apart with '\0's, where the start of each new command line argument is saved into a list. So if we account for the fact that each pointer is going to be a c-style string we can construct string_views by lopping off the terminating '\0'.

The exception is argv[argc-1]!, because argv[argc] is likely going to be null, or just invalid. So we can write something like this:

__forceinline std::string_view argument(int argc, const char* argv[], int i) {
    return (i+1) < argc ? 
    std::string_view{argv[i],argv[i+1]-1}:
    std::string_view{argv[i]}
}

So in a weird way, we know the size of every argument but one...the last one, but that's perfect, because that's a constant space constraint. So another approach is that we can always reserve space to know the size of at least one argument (one size_t variable sitting around somewhere) and use that in place of the result of argv[argc-1].

Now, the only usage of strlen like functionality is to handle the last commandline parameter. We can guard further against accessing arguments past argc if we'd like.

The downside is if someone else uses and calls your program directly and they don't have that same memory layout for their arguments then your program fails in spectacular fashion. It's possible though to sanity check that you are dealing with a contiguous memory block fairly quickly by performing strlen on each argument and verifying that each c-string's length matches argv[i+1]-1.

We can still use the function above if we'd like, except now we can reduce argc so that we call std::string_view{argv[i]} which will use the strlen constructor instead for any parameters we know aren't part of that initial contiguous memory block. Then the usage would be something like:

std::string_view argument_i = i < argc ? argument(known_contiguous, argc, i) : std::string_view{};

The cost of doing this safely essentially matches creating a vector except for the allocation. I'd imagine my trick here probably can be converted to work with the ranges approach.

Here's a barebones class to handle this.

#pragma once
#include <string_view>

struct arguments {
private:
    int argc = {};
    const char** argv = {};
    int contiguous_argc = {};
    size_t last_argument_size = {};
public:
    arguments(int arg_count, const char* arg_values[]) {
        argc = arg_count;
        argv = arg_values;

        for (int i = 0; i < argc; i++) {
            last_argument_size = strlen(argv[i]);
            if (argv[i + 1] == 0)
                break;
            size_t constant_length = (argv[i + 1] - argv[i]) - 1;
            if (last_argument_size != constant_length)
                break;
            contiguous_argc += 1;
        }
    }

    std::string_view argument(int i) {
        size_t constant_length = (argv[i + 1] - argv[i]) - 1;
        return
            (i <= contiguous_argc) ? std::string_view{argv[i], (i == contiguous_argc) ? last_argument_size : constant_length} :
            (i < argc) ? std::string_view{argv[i]} : std::string_view{};
    }
};

Upvotes: 1

Barry
Barry

Reputation: 303087

If you use () instead of {} you don't need the really verbose cast:

std::span(argv, argc)

This gives you a range of char const*. You can convert those to string_view using transform. This has some overhead because you need to do a bunch of strlens:

std::span(argv, argc)
    | std::views::transform([](char const* v){ return std::string_view(v); })

For stuff like this, I have a function object that performs casting, which is pretty easy to write, so this could be:

std::span(argv, argc)
    | std::views::transform(static_cast_<std::string_view>)

Upvotes: 5

Related Questions