Christoph
Christoph

Reputation: 735

How to split a string by a defined string with multiple characters in bash?

Following output consisting of several devices needs to be parsed:

 0 interface=ether1 address=172.16.127.2 address4=172.16.127.2
   address6=fe80::ce2d:e0ff:fe00:05 mac-address=CC:2D:E0:00:00:08
   identity="myrouter1" platform="MikroTik" version="6.43.8 (stable)"

 1 interface=ether2 address=10.5.44.100 address4=10.5.44.100
   address6=fe80::ce2d:e0ff:fe00:07 mac-address=CC:2D:E0:00:00:05
   identity="myrouter4" platform="MikroTik" version="6.43.8 (stable)"

 3 interface=ether4 address=fe80::ba69:f4ff:fe00:0017
   address6=fe80::ba69:f4ff:fe00:0017 mac-address=B8:69:F4:00:00:07
   identity="myrouter2" platform="MikroTik" version="6.43.8 (stable)"

...

10 interface=ether5 address=10.26.51.24 address4=10.26.51.24
   address6=fe80::ba69:f4ff:fe00:0039 mac-address=B8:69:F4:00:00:04
   identity="myrouter3" platform="MikroTik" version="6.43.8 (stable)"

11 interface=ether3 address=10.26.51.100 address4=10.26.51.100
   address6=fe80::ce2d:e0ff:fe00:f00 mac-address=CC:2D:E0:00:00:09
   identity="myrouter5" platform="MikroTik" version="6.43.8 (stable)"

edit: for ease of things I shortened and anonymized the output, first block has 7 lines, second block has 5 lines, third block has 7 lines, fourth block 4 lines, so the number of lines is inconsistent.

Basically its the output from a Mikrotik device: "/ip neighbor print detail"

Optimal would be to access every device(=number) on its own, then further access all setting=value (of one device) seperately to finally access settings like $device[0][identity] or similar.

I tried to set IFS='\d{1,2} ' but seems IFS only works for single character seperation.

Looking on the web I didn't find a way to accomplish this, am I looking for the wrong way and there is another way to solve this?

Thanks in advance!

edit: Found this solution Split file by multiple line breaks which helped me to get:

devices=()
COUNT=0;
while read LINE
do
    [ "$LINE" ] && devices[$COUNT]+="$LINE " || { (( ++COUNT )); }
done < devices.txt

then i could use @Kamil's solution to easily access values.

Upvotes: 0

Views: 1404

Answers (3)

David C. Rankin
David C. Rankin

Reputation: 84561

While your precise output format is a bit unclear, bash offers an efficient way to parse the data making use of process substitution. Similar to command substitution, process substitution allows redirecting the output of commands to stdin. This allows you to read the result of a set of commands that reformat your mikrotik file into a single line for each device.

While there are a number of ways to do it, one of the ways to handle the multiple gymnastics needed to reformat the multi-line information for each device into a single line is by using tr and sed. tr to first replace each '\n' with an '_' (or pick your favorite character not used elsewhere), and then again to "squeeze" the leading spaces to a single space (technically not required, but for completeness). After replacing the '\n' with '_' and squeezing spaces, you simply use two sed expressions to change the "__" (resulting from the blank line) back into a '\n' and then to remove all '_'.

With that you can read your device number n and the remainder of the line holing your setting=value pairs. To ease locating your "identity=" line, simply converting the line into an array and looping using parameter expansions (for substring removal), you can save and store the "identity" value as id (trimming the double-quotes is left to you)

Now it is simply a matter of outputting the value (or doing whatever you wish with them). While you can loop again and output the array values, it is just a easy to pass the intentionally unquoted line to printf and let the printf-trick handle separating the setting=value pairs for output. Lastly, you form your $device[0][identity] identifier and output as the final line in the device block.

Putting it altogether, you could do something like the following:

#!/bin/bash

id=
while read n line; do       ## read each line from process substitution
    a=( $line )             ## split line into array
    for i in ${a[@]}; do    ## search array, set id
        [ "${i%=*}" = "identity" ] && id="${i##*=}"
    done
    echo "device=$n"            ## output device=
    printf "  %s\n" ${line[@]}  ## output setting=value (unquoted on purpose)
    printf "  \$device[%s][%s]\n" "$n" "$id"    ## $device[0][identity]
done < <(tr '\n' '_' < "$1" | tr -s ' ' | sed -e 's/__/\n/g' -e 's/_//g')

Example Use/Output

Note, the script takes the filename to parse as the first input.

$ bash mikrotik_parse.sh mikrotik
device=0
  interface=ether1
  address=172.16.127.2
  address4=172.16.127.2
  address6=fe80::ce2d:e0ff:fe00:05
  mac-address=CC:2D:E0:00:00:08
  identity="myrouter1"
  platform="MikroTik"
  version="6.43.8
  (stable)"
  $device[0]["myrouter1"]
device=1
  interface=ether2
  address=10.5.44.100
  address4=10.5.44.100
  address6=fe80::ce2d:e0ff:fe00:07
  mac-address=CC:2D:E0:00:00:05
  identity="myrouter4"
  platform="MikroTik"
  version="6.43.8
  (stable)"
  $device[1]["myrouter4"]
device=3
  interface=ether4
  address=fe80::ba69:f4ff:fe00:0017
  address6=fe80::ba69:f4ff:fe00:0017
  mac-address=B8:69:F4:00:00:07
  identity="myrouter2"
  platform="MikroTik"
  version="6.43.8
  (stable)"
  $device[3]["myrouter2"]

Look things over and let me know if you have further questions. As mentioned at the beginning, you haven't defined an explicit output format you are looking for, but gleaning what information was in the question, this should be close.

Upvotes: 1

KamilCuk
KamilCuk

Reputation: 140990

Bash likes single long rows with delimter separated values. So first we need to convert your file to such format.

Below I read 4 lines at a time from input. I notices that the output spans over 4 lines only - I just concatenate the 4 lines and act as if it is a single line.

while
    IFS= read -r line1 &&
    IFS= read -r line2 &&
    IFS= read -r line3 &&
    IFS= read -r line4 &&
    line="$line1 $line2 $line3 $line4"
do
    if [ -n "$line4" ]; then
        echo "ERR: 4th line should be empt - $line4 !" >&2
        exit 4
    fi

    if ! num=$(printf "%d" ${line:0:3}); then
        echo "ERR: reading number" >&2
        exit 1
    fi

    line=${line:3}
    # bash variables can't have `-`
    line=${line/mac-address=/mac_address=}

    # unsafe magic
    vars=(interface address address4
        address6 mac_address identity platform version)
    for v in "${vars[@]}"; do
        unset "$v"
        if ! <<<"$line" grep -q "$v="; then
            echo "ERR: line does not have $v= part!" >&2
            exit 1
        fi
    done

    # eval call
    if ! eval "$line"; then
        echo "ERR: eval line=$line" >&2
        exit 1
    fi

    for v in "${vars[@]}"; do
        if [ -z "${!v}" ]; then
            echo "ERR: variable $v was not set in eval!" >&2
            exit 1;
        fi
    done

    echo "$num: $interface $address $address4 $address6 $mac_address $identity $platform $version"


done < file
  • then I retrieve the leading number from the line, which I suspect was printed with printf "%3d" so I just slice the line ${line:0:3}
  • for the rest of the line I indent to use eval. In this case I trust upstream, but I try to assert some cases (variable not defined in the line, some syntax error and similar)
  • then the magic eval "$line" happens, which assigns all the variables in my shell
  • after that I can use variables from the line like normal variables
  • live example at tutorialspoint
  • Eval command and security issues

Upvotes: 0

paulsm4
paulsm4

Reputation: 121649

  1. I think you're on the right track with IFS.

  2. Try piping IFS=$'\n\n' (to break apart the line groups by interface) through cut (to extract the specific field(s) you want for each interface).

Upvotes: 0

Related Questions