Martin S.
Martin S.

Reputation: 21

Bash iterating over file with irregular line arguments

I have a number of irregular .txt files formatted from the .csv ones. Files contain following data delimited by the semicolon:

A;B;C;D;E;F;G;H;
A;B;C;D;E;F;G;H;I;J;K;
A;B;C;D;E;F;G;H;I;J;K;L;M;N;
A;B;C;D;E;F;G;H;I;J;K;L;M;N;O;P;Q;

What I would like to do is to take the specific values from each line. Example of code which i used looks as follows and works well when the lines which contain the same no. of delimiters:

OIFS=$IFS
IFS=";"
while read var1 var2 var3 var4 var5 var6 var7 var8 var9 var10
do
echo $var2, $var6, $var7, $var8
done < test.txt
IFS=$OIFS

But I'm stucked with the implementation of the code which will count the no. of ";" and apply specific action. Each line's column "B" and whatever exist after column "E" should be taken into account. Minimum no of ";" in each line is 8, while the maximum is 20 (with the increment of "3"). Desired output is:

For lines containing 8 ";"

echo $B { $F { $G:$H } }

For lines including 11 ";"

echo $B { $F { $G:$H } $I { $J:$K } }

For lines with 14 ";"

echo $B { $F { $G:$H } $I { $J:$K } $L { $M:$N } }

And so on. Is it doable in bash ?
Thank you.

Upvotes: 1

Views: 119

Answers (5)

chepner
chepner

Reputation: 531888

Read each line into an array using the -a option to read; this makes dealing with variable-length lines much easier.

while IFS=';' read -a vars; do
    printf "%s {" "${vars[1]}"
    for ((i=5; i<${#vars[@]}; i+=3)); do
        printf " %s { %s %s }" "${vars[@]:i:3}"
    done
    printf " }\n"
done < test.txt

Upvotes: 1

Walter A
Walter A

Reputation: 20022

I think you are doing well so far! You just need some small hints:

  • You can set a shell variable for one command
    A changed the solution of IFS a bit.
  • You can check the remaing vars and see if the are empty
  • I will use ${x} in the vars.
    Not needed for this code but a good habit.
  • Use read -r not simple read.

The next code is how you can do when you know you have a small number of fields. You have at most 20 fields now, so you can add more vars and code to the first solution:

while IFS=";" read -r var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13 var14; do
      echo $var2, $var6, $var7, $var8
      if [ -z "${var9}" ]; then
         echo "Line without 8 delimiters"
      elif [ -z "${var10}${var11}${var12}" ]; then
         echo "Line with 9 delimiters"
      else
         echo "Line with more than 9 delimiters"
      fi   
 done

I did not complete the code above, since it is not well structured.
You would like to implement this with a function to take care of a repeating group.

function repeatgroup {
   output=""
   remaining="$*"
   printf "{ "
   while [ -n "${remaining}" ]; do
       rem1=$(echo "$remaining" | cut -d";" -f1)
       rem2=$(echo "$remaining" | cut -d";" -f2)
       rem3=$(echo "$remaining" | cut -d";" -f3)
       remaining=$(echo "$remaining" | cut -d";" -f4-)
       printf "%s {%s:%s} " "${rem1}" "${rem2}" "${rem3}"
   done
}

    while IFS=";" read -r var1 var2 var3 var4 var5 remaining; do
          if [ -z "${var5}${remaining}" ]; then
             echo "field shortage"
          elif [ -z "${remaining}" ]; then
             echo "Line without 8 delimiters"
             echo "{ ${var2} }"
          else
             printf "{ %s " "${var2}"
             repeatgroup "${remaining}"
             printf "}\n"
          fi
     done < input

Remark:
Both rem1=$(echo "$remaining" | cut -d";" -f1) and remaining=$(echo "$remaining" | cut -d";" -f4-) can be written using internal Bash functions, but I thought the code will get hard to understand. When you need to parse large files, you can try that first.

Upvotes: 0

user4918296
user4918296

Reputation:

A Bash only solution:

#!/bin/bash

OLD_IFS=$IFS
IFS=";"
while read line; do
    set -- $line
    echo -n "$2 { "
    shift 5
    while [[ -n $1 ]];do
        echo -n "$1 { $2:$3 } "
        shift 3
    done
    echo "}"
done < data
IFS=$OLD_IFS

Input file:

$ cat data 
A;B;C;D;E;F;G;H;
A;B;C;D;E;F;G;H;I;J;K;
A;B;C;D;E;F;G;H;I;J;K;L;M;N;
A;B;C;D;E;F;G;H;I;J;K;L;M;N;O;P;Q;

Result:

$ ./script.sh 
B { F { G:H } }
B { F { G:H } I { J:K } }
B { F { G:H } I { J:K } L { M:N } }
B { F { G:H } I { J:K } L { M:N } O { P:Q } }

Solution 2

Same but with arrays

#!/bin/bash

OLD_IFS=$IFS
IFS=";"
os=5
while read line;do
    c=0
    a=($line)
    echo -n "${a[1]} { "
    while [[ -n ${a[$((os+c*3))]} ]];do
        echo -n "${a[$((os+c*3))]} { "
        echo -n "${a[$((os+c*3+1))]}:${a[$((os+c*3+2))]} } "
        ((c++))
    done
    echo "}"
done < data
IFS=$OLD_IFS

Upvotes: 0

WaelJ
WaelJ

Reputation: 3012

Alternatively, you can use python to do what you want (if I understood it correctly):

import fileinput

# http://stackoverflow.com/questions/34576772/bash-iterating-over-file-with-irregular-line-arguments/34576899#34576899

def columns_are_valid(columns):
    return len(columns) >= 8 and len(columns) % 3 == 2

# Returns every three columns as a tuple
# Example: 1,2,3,4,4,5,6,7,8,9  ->  (1,2,3) , (4,5,6) , (7,8,9)
def every_three(rest_columns):
    it = iter(rest_columns)
    while True:
        yield next(it), next(it), next(it)


for line in fileinput.input():
    line = line.rstrip(';\n')  # remove trailing newline and ';'
    columns = line.split(';') # split by ';'
    assert columns_are_valid(columns)

    column_b = columns[1]

    # Selects columns F onwards
    columns_f_onwards = columns[5:]

    # Format parts like '$F { $G:$H }'
    parts = [ '%s {%s:%s}' % (a,b,c) for a,b,c in every_three(columns_f_onwards) ]
    space_delimited_parts = ' '.join(parts)

    print '{ %s { %s }' % (column_b, space_delimited_parts)

Example run:

 % python myscript.py

With input:

A;B;C;D;E;F;G;H;
A;B;C;D;E;F;G;H;I;J;K;
A;B;C;D;E;F;G;H;I;J;K;L;M;N;
A;B;C;D;E;F;G;H;I;J;K;L;M;N;O;P;Q;

Outputs:

{ B { F {G:H} }
{ B { F {G:H} I {J:K} }
{ B { F {G:H} I {J:K} L {M:N} }
{ B { F {G:H} I {J:K} L {M:N} O {P:Q} }

Upvotes: 0

WaelJ
WaelJ

Reputation: 3012

I'm not sure I fully understand what you want to do, but this might help as a first step.

Each line's column "B" and whatever exist after column "E" should be taken into account.

For this you can use the cut command:

cut -d ';' -f 2,6-

Where -d ';' sets the delimiter and -f 2,6- selects fields 2 and 6 onwards.

This will select columns $B and columns $F onwards.

You can also change the delimiter that is output by using --output-delimiter

Upvotes: 1

Related Questions