Tomekke
Tomekke

Reputation: 15

Put lines of a text file in an array in bash

I'm taking over a bash script from a colleague that reads a file, process it and print another file based on the line in the while loop at the moment.

I now need to append some features to it. The one I'm having issues with right now is to read a file and put each line into an array, except the 2nd column of that line can be empty, e.g.:

For a text file with \t as separator:

A\tB\tC
A\t\tC

For a CSV file same but with , as separator:

A,B,C
A,,C

Which should then give

["A","B","C"] or ["A", "", "C"]

The code I took over is as follow:

while IFS=$'\t\r' read -r -a col; do
# Process the array, put that into a file
lp -d $printer $file_to_print
done < $input_file

Which works if B is filled, but B need to be empty now sometimes, so when the input files keeps it empty, the created array and thus the output file to print just skips this empty cell (array is then ["A","C"]).

I tried writing the whole bloc on awk but this brought it's own sets of problems, making it difficult to call the lp command to print.

So my question is, how can I preserve the empty cell from the line into my bash array, so that I can call on it later and use it?

Thank you very much. I know this might be quite confused so please ask and I'll specify.

Edit: After request, here's the awk code I've tried. The issue here is that it only prints the last print request, while I know it loops over the whole file, and the lp command is still in the loop.

awk 'BEGIN {
    inputfile="'"${optfile}"'"
    outputfile="'"${file_loc}"'"
    printer="'"${printer}"'"
    while (getline < inputfile){
      print "'"${prefix}"'" > outputfile
      split($0,ft,"'"${IFSseps}"'");
      if (length(ft[2]) == 0){
        print "CODEPAGE 1252\nTEXT 465,191,\"ROMAN.TTF\",180,7,7,\""ft[1]"\"" >> outputfile
        size_changer = 0
      } else {
        print "CODEPAGE 1252\nTEXT 465,191,\"ROMAN.TTF\",180,7,7,\""ft[1]"_"ft[2]"\"" >> outputfile
        size_changer = 1
      }
      if ( split($0,ft,"'"${IFSseps}"'") > 6)
        maxcounter = 6;
      else
        maxcounter = split($0,ft,"'"${IFSseps}"'");
      for (i = 3; i <= maxcounter; i++){
        x=191-(i-2)*33
        print "CODEPAGE 1252\nTEXT 465,"x",\"ROMAN.TTF\",180,7,7,\""ft[i]"\"" >> outputfile
      }
      print "PRINT ""'"${copies}"'"",1" >> outputfile
      close(outputfile)
      "'"`lp -d ${printer} ${file_loc}`"'"
    }
    close("'"${file_loc}"'");
  }'

EDIT2: Continuing to try to find a solution to it, I tried following code without success. This is weird, as just doing printf without putting it in an array keeps the formatting intact.

$ cat testinput | tr '\t' '>'
A>B>C
A>>C

# Should normally be empty on the second ouput line
$ while read line; do IFS=$'\t' read -ra col < <(printf "$line"); echo ${col[1]}; done < testinput
B
C

Upvotes: 1

Views: 760

Answers (3)

Paul Hodges
Paul Hodges

Reputation: 15418

It may just be your # Process the array, put that into a file part.

IFS=, read -ra ray <<< "A,,C"
for e in "${ray[@]}"; do o="$o\"$e\","; done
echo "[${o%,}]"
["A","","C"]

See @Glenn's excellent answer regarding tabs.

My simple data file:

$: cat x # tab delimited, empty field 2 of line 2
a   b   c
d       f

My test:

while IFS=$'\001' read -r a b c; do
  echo "a:[$a] b:[$b] c:[$c]"
done < <(tr "\t" "\001"<x)
a:[a] b:[b] c:[c]
a:[d] b:[] c:[f]

Note that I used ^A (a 001 byte) but you might be able to use something as simple as a comma or pipe (|) character. Choose based on your data.

Upvotes: 0

markp-fuso
markp-fuso

Reputation: 35336

One bash example using parameter expansion where we convert the delimiter into a \n and let mapfile read in each line as a new array entry ...

For tab-delimited data:

for line in $'A\tB\tC' $'A\t\tC'
do
    mapfile -t array <<< "${line//$'\t'/$'\n'}"
    echo "############# ${line}"
    typeset -p array
done

############# A B       C
declare -a array=([0]="A" [1]="B" [2]="C")
############# A         C
declare -a array=([0]="A" [1]="" [2]="C")

NOTE: The $'...' construct insures the \t is treated as a single <tab> character as opposed to the two literal characters \ + t.

For comma-delimited data:

for line in 'A,B,C' 'A,,C'
do
    mapfile -t array <<< "${line//,/$'\n'}"
    echo "############# ${line}"
    typeset -p array
done

############# A,B,C
declare -a array=([0]="A" [1]="B" [2]="C")
############# A,,C
declare -a array=([0]="A" [1]="" [2]="C")

NOTE: This obviously (?) assumes the desired data does not contain a comma (,).

Upvotes: 0

glenn jackman
glenn jackman

Reputation: 247162

For tab, it's complicated.

From 3.5.7 Word Splitting in the manual:

A sequence of IFS whitespace characters is also treated as a delimiter.

Since tab is an "IFS whitespace character", sequences of tabs are treated as a single delimiter

IFS=$'\t' read -ra ary <<<$'A\t\tC'
declare -p ary
declare -a ary=([0]="A" [1]="C")

What you can do is translate tabs to a non-whitespace character, assuming it does not clash with the actual data in the fields:

line=$'A\t\tC'
IFS=, read -ra ary <<<"${line//$'\t'/,}"
declare -p ary
declare -a ary=([0]="A" [1]="" [2]="C")

To avoid the risk of colliding with commas in the data, we can use an unusual ASCII character: FS, octal 034

line=$'A\t\tC'
printf -v FS '\034'
IFS="$FS" read -ra ary <<<"${line//$'\t'/"$FS"}"

# or, without the placeholder variable
IFS=$'\034' read -ra ary <<<"${line//$'\t'/$'\034'}"

declare -p ary
declare -a ary=([0]="A" [1]="" [2]="C")

Upvotes: 2

Related Questions