Reputation: 71

Iterate through split string by another string

I want to create a bash script to parse data returned by a this command :

openvpn --show-pkcs11-ids /usr/lib/libeTPkcs11.so

The typical output is :

The following objects are available for use.
Each object shown below may be used as parameter to
--pkcs11-id option please remember to use single quote mark.

Certificate
       DN:             XXX
       Serial:         XXXX
       Serialized id:  XXXX

Certificate
       DN:             XXXX
       Serial:         XXXX
       Serialized id:  XXXX

Certificate
       DN:             XXXXX
       Serial:         XXXX
       Serialized id:  XXXX

I want to get an array in bash containing 3 elements : the 3 "Certificate" blocks. I tried a lot of method of splitting but all of them only output an echo command, not an actual array.

Any ideas ?

Thx !

Upvotes: 0

Answers (2)

David C. Rankin

Reputation: 84579

This is one where it would be much simpler and (much much faster) to use awk. awk provide arrays and is much more capable at processing input records than read. With awk you simply write rules to be applied to each line of input. In your case you just need to recognize whether the line begins with "DN:", "Serial:", or "Serialized". You can then store the associated value in a separate array, say arrays dn, serial, and serid. To accomplish this in awk you need nothing more than:

awk '
    $1 == "Certificate" {n++};              # increment n
    NF == 2 {                               # fill dn & serial array
        $1 == "DN:" && dn[n]=$2
        $1 == "Serial:" && serial[n]=$2
    }
    NF == 3 {                               # fill serid array
        $1 == "Serialized" && serid[n]=$3
    }
END {   # output results
    print "\nDN:\t\tSerial:\t\tSerialized id:"
    for (i in dn) print dn[i], "\t\t", serial[i], "\t\t", serid[i]
}' file

Above if the first field ($1) is "Certificate" you just increment a counter. If there are 2 fields in the line (NF == 2) then you check if the line begins with "DN:" or "Serial" and add the 2nd field to the proper array. If the line has 3-fields ("Serialized", "id:" and your value) you store the value in the serid array.

With all values stored, you can iterate over the arrays in the END rule, providing any output you need. Above it simply outputs the content in tabular form. You can just copy/middle-mouse-paste in the command line to test.

Example Use/Output

$ awk '
>     $1 == "Certificate" {n++};              # increment n
>     NF == 2 {                               # fill dn & serial array
>         $1 == "DN:" && dn[n]=$2
>         $1 == "Serial:" && serial[n]=$2
>     }
>     NF == 3 {                               # fill serid array
>         $1 == "Serialized" && serid[n]=$3
>     }
> END {   # output results
>     print "\nDN:\t\tSerial:\t\tSerialized id:"
>     for (i in dn) print dn[i], "\t\t", serial[i], "\t\t", serid[i]
> }' file

DN:             Serial:         Serialized id:
XXX              XXXX            XXXX
XXXX             XXXX            XXXX
XXXXX            XXXX            XXXX

For large file processing, awk will be orders of magnitude faster that looping in a shell script. Let me know if this satisfies your needs of if you need additional help.

Edit Per-Comment

If you are dealing with a file that has mixed tabs and spaces being used a separators, that can present problem with awk parsing using a default field separator (space). To consider a sequence of mixed spaces/tabs as a separator, with GNU awk you can provide a regular expression for the separator. For instance considering a sequence of one or more spaces or tabs can be specified as -F'[ \t]+'. The example below makes use of the separator. (note: the field numbers will change as a result)

awk -F'[ \t]+' '
    $1 == "Certificate" {n++};              # increment n
    NF == 3 {                               # fill dn & serial array
        $2 == "DN:" && dn[n]=$3
        $2 == "Serial:" && serial[n]=$3
    }
    NF == 4 {                               # fill serid array
        $2 == "Serialized" && serid[n]=$4
    }
END {   # output results
    print "\nDN:\t\tSerial:\t\tSerialized id:"
    for (i in dn) print dn[i], "\t\t", serial[i], "\t\t", serid[i]
}' f

Example Use/Output

With your same data you would then have:

$ awk -F'[ \t]+' '
>     $1 == "Certificate" {n++};              # increment n
>     NF == 3 {                               # fill dn & serial array
>         $2 == "DN:" && dn[n]=$3
>         $2 == "Serial:" && serial[n]=$3
>     }
>     NF == 4 {                               # fill serid array
>         $2 == "Serialized" && serid[n]=$4
>     }
> END {   # output results
>     print "\nDN:\t\tSerial:\t\tSerialized id:"
>     for (i in dn) print dn[i], "\t\t", serial[i], "\t\t", serid[i]
> }' f

DN:             Serial:         Serialized id:
XXX              XXXX            XXXX
XXXX             XXXX            XXXX
XXXXX            XXXX            XXXX

Not knowing what the space/tab makeup of your posted text actually is, this should handle either case.

Further Update Posting Input Contents Taken From Question

The following is the input file f (or file) used with the examples above. It was taken from your question, but there is no guarantee the space/tab translation is the same give the copy/paste into the question. The last example above should handle it regardless. The only other caveat is if you have a file with DOS line ending you are feeding to awk -- it won't work. You can check by running the utility file yourfilename and it will report is DOS CRLF line endings are present. You can then use dos2unix yourfilename to correct the problem and convert the file to Unix/POSIX line endings.

Example Input File

$ cat f
The following objects are available for use.
Each object shown below may be used as parameter to
--pkcs11-id option please remember to use single quote mark.

Certificate
       DN:             XXX
       Serial:         XXXX
       Serialized id:  XXXX

Certificate
       DN:             XXXX
       Serial:         XXXX
       Serialized id:  XXXX

Certificate
       DN:             XXXXX
       Serial:         XXXX
       Serialized id:  XXXX

Hexdump of Contents

$ hexdump -Cv f
00000000  54 68 65 20 66 6f 6c 6c  6f 77 69 6e 67 20 6f 62  |The following ob|
00000010  6a 65 63 74 73 20 61 72  65 20 61 76 61 69 6c 61  |jects are availa|
00000020  62 6c 65 20 66 6f 72 20  75 73 65 2e 0a 45 61 63  |ble for use..Eac|
00000030  68 20 6f 62 6a 65 63 74  20 73 68 6f 77 6e 20 62  |h object shown b|
00000040  65 6c 6f 77 20 6d 61 79  20 62 65 20 75 73 65 64  |elow may be used|
00000050  20 61 73 20 70 61 72 61  6d 65 74 65 72 20 74 6f  | as parameter to|
00000060  0a 2d 2d 70 6b 63 73 31  31 2d 69 64 20 6f 70 74  |.--pkcs11-id opt|
00000070  69 6f 6e 20 70 6c 65 61  73 65 20 72 65 6d 65 6d  |ion please remem|
00000080  62 65 72 20 74 6f 20 75  73 65 20 73 69 6e 67 6c  |ber to use singl|
00000090  65 20 71 75 6f 74 65 20  6d 61 72 6b 2e 0a 0a 43  |e quote mark...C|
000000a0  65 72 74 69 66 69 63 61  74 65 0a 20 20 20 20 20  |ertificate.     |
000000b0  20 20 44 4e 3a 20 20 20  20 20 20 20 20 20 20 20  |  DN:           |
000000c0  20 20 58 58 58 0a 20 20  20 20 20 20 20 53 65 72  |  XXX.       Ser|
000000d0  69 61 6c 3a 20 20 20 20  20 20 20 20 20 58 58 58  |ial:         XXX|
000000e0  58 0a 20 20 20 20 20 20  20 53 65 72 69 61 6c 69  |X.       Seriali|
000000f0  7a 65 64 20 69 64 3a 20  20 58 58 58 58 0a 0a 43  |zed id:  XXXX..C|
00000100  65 72 74 69 66 69 63 61  74 65 0a 20 20 20 20 20  |ertificate.     |
00000110  20 20 44 4e 3a 20 20 20  20 20 20 20 20 20 20 20  |  DN:           |
00000120  20 20 58 58 58 58 0a 20  20 20 20 20 20 20 53 65  |  XXXX.       Se|
00000130  72 69 61 6c 3a 20 20 20  20 20 20 20 20 20 58 58  |rial:         XX|
00000140  58 58 0a 20 20 20 20 20  20 20 53 65 72 69 61 6c  |XX.       Serial|
00000150  69 7a 65 64 20 69 64 3a  20 20 58 58 58 58 0a 0a  |ized id:  XXXX..|
00000160  43 65 72 74 69 66 69 63  61 74 65 0a 20 20 20 20  |Certificate.    |
00000170  20 20 20 44 4e 3a 20 20  20 20 20 20 20 20 20 20  |   DN:          |
00000180  20 20 20 58 58 58 58 58  0a 20 20 20 20 20 20 20  |   XXXXX.       |
00000190  53 65 72 69 61 6c 3a 20  20 20 20 20 20 20 20 20  |Serial:         |
000001a0  58 58 58 58 0a 20 20 20  20 20 20 20 53 65 72 69  |XXXX.       Seri|
000001b0  61 6c 69 7a 65 64 20 69  64 3a 20 20 58 58 58 58  |alized id:  XXXX|
000001c0  0a                                                |.|
000001c1

Let me know the results of your file examination.

Upvotes: 2

M. Twarog

Reputation: 2611

You can use AWK to do that. It is a tool specifically created for transforming table-like output.

openvpn --show-pkcs11-ids /usr/lib/libeTPkcs11.so | grep 'Certificate\|DN:\|Serial:\|Serialized id:' | awk -v RS="Certificate" '{print $2,$4,$7}'

Explanation:

grep 'Certificate\|DN:\|Serial:\|Serialized id:' - Choose only interesting lines of output
awk -v RS="Certificate" '{print $2,$4,$7}' - See below comment

Comment: AWK enables you to change the record separator using "-v RS=" parameter. By default it is a newline, so each line of the file is a record, but it can be changed to any string e.g. "Certificate".

Output is not an array, but every certificate is described in separate line you can further pipe to another tool.

Upvotes: 1

Iterate through split string by another string

Answers (2)

Related Questions