code_fodder
code_fodder

Reputation: 16321

bash extract segments of a string and store in variables

I want to convert the output from cppclean into cppcheck-like xml sections, such that:

./bit_limits.cpp:25: static data 'bit_limits::max_name_length'

becomes:

<error id="static data" msg="bit_limits::max_name_length">
    <location file="./bit_limits.cpp" line="25"/>
</error>

I started with some awk:

test code:

echo "./bit_limits.cpp:25: static data 'bit_limits::max_name_length'" > test
cat test.out | awk -F ":" '{print "<error id=\""$3"\""}
                           {print "msg=\""}{for(i=4;i<=NF;++i)print ":"$i}{print "\">"}
                           {print "<location file=\""$1"\" line=\""$2"\"/>"}
                           {print "</error>"}'

Note: to run this you need to put the cat command back into one line - I printed it over multi-lines for ease of reading.

Explanation: I am using awk and delimiting by colon ":" - which splits the line into useful chunks which I try to construct into the XML:

This is close, but not quite right, it produces:

<error id=" static data 'bit_limits"
msg="
:
:max_name_length'
">
<location file="./bit_limits.cpp" line="25"/>
</error>

The id field should just be "static data" and the msg field should be "'bit_limits::max_name_length'", but other then that it is ok (I don't mind it being split of multi-lines at the moment - though I would prefer that awk did not print a new line each time.

Update As @charlesduffy pointed out - for context - I want to do this in bash because I want to embed this code into a makefile (or just a normal bash script) for maximum portability (i.e. no need for python or other tools).

Upvotes: 4

Views: 320

Answers (2)

Cyrus
Cyrus

Reputation: 88553

With bash and a regex:

x="./bit_limits.cpp:25: static data 'bit_limits::max_name_length'"
[[ $x =~ (.+):([0-9]+):\ (.+)\ \'(.+)\' ]]

declare -p BASH_REMATCH

Output:

declare -ar BASH_REMATCH='([0]="./bit_limits.cpp:25: static data '\''bit_limits::max_name_length'\''" [1]="./bit_limits.cpp" [2]="25" [3]="static data" [4]="bit_limits::max_name_length")'

The elements 1 to 4 in array BASH_REMATCH contain the searched strings.

From man bash:

BASH_REMATCH: An array variable whose members are assigned by the =~ binary operator to the [[ conditional command. The element with index 0 is the portion of the string matching the entire regular expression. The element with index n is the portion of the string matching the nth parenthesized subexpression. This variable is read-only.

Upvotes: 6

Graeme
Graeme

Reputation: 3041

Probably more complex than it needs to be:

awk '{
    split($1, file_line, ":")
    field = 2
    while(substr($field, 1, 1) != "'\''") {
        id = id " " $field
        ++field
    }
    id = substr(id, 2)
    while(field <= NF) {
        msg = msg " " $field
        ++field
    }
    msg = substr(msg, 3, length(msg) - 1)
    printf("<error id=\"%s\" msg=\"%s\">\n", id, msg)
    printf("    <location file=\"%s\" line=\"%s\">\n", file_line[1], file_line[2])
    print "</error>"
}' test.out

Upvotes: 1

Related Questions