lisprogtor
lisprogtor

Reputation: 5749

awk: How to store whole file in awk multidimensional arrays?

I am trying to store a whole file in a multidimensional array (in bash shell in Ubuntu), but hove not find an elegant way to do it. Can you help?

I have this file with "|" as the field separator:

john  |  violin  |  expert level |  math grade  |  95
doe   |  piano   |  novice       |  math grade  |  100  |  extra info | variable length
jane  |  cello   |  beginner     |  physics     |  90
mary  |  flute   |  advanced     |  chemistry   |  95   |  college next year

What I want to do is to store all the fields in a multidimensional array:

awk 'BEGIN { x = 0;
              while ((getline oneLine < "studentFile") > 0) {
                    theFile[x] = oneLine;
                    ++x;
              }
             close("studentFile");
     } 
     { for (y in theFile) print theFile[y]; }' studentFile <----- if I don't put a file here, the command won't run

But this is only one dimensional; how do I store the varying length lines in 2-D array?

I have also tried:

 awk 'BEGIN { x = 0;
              while ((getline oneLine < "studentFile") > 0) {
                      theFile[x] = split(oneLine, arr, "|");
                      ++x;
              }
              close("studentFile");
            }
            { for (y in theFile) {
                  for (z in theFile[y]) {
                      print theFile[y][z];
                  }
              }
            }' studentFile <----- if I don't put a file here, the command won't run

But it says: "awk: cmd. line:9: (FILENAME=studentFile FNR=1) fatal: attempt to use a scalar value as array"

Also I tried the fix the error:

split(theFile[y], newArray, "|"); 
for (z in newArray) {
    print newArray[z];
}

but it only printed the indices. Now I am out of ideas. Please help!

Thank you very very much !!!

Upvotes: 0

Views: 1128

Answers (3)

RARE Kpop Manifesto
RARE Kpop Manifesto

Reputation: 2875

If you're willing to use gawk, you can directly split individual rows into sub-arrays without looping through the fields :


gawk -be '{
        split($_, __[NR]) 
} END {
    for (___ in __) {
        printf(" row %4s : {%.0s",
                    ___, ____ = _<_)
        for(_ in __[___]
            printf("%.*s %s",____ || ____++, ",",
                                      __[___][_]) 
        printf(" }\n")
}   }'

     1  17 13 19 25 31
     2  3915 2127 33 
     3  5 1117 23 29 35 

 row    1 : { 17, 13, 19, 25, 31 }
 row    2 : { 3915, 2127, 33 }
 row    3 : { 5, 1117, 23, 29, 35 }

After each row split() you can also add something like __[NR][+_] = $_ if you want the original row's untampered formatting, keeping in mind that for(_ in __) would also include that full row you've added, since it's an iterator for all indices

(strangely enough , the empty string array index __[""] is perfectly valid in awk)

delete __[2] actually only deletes out the sub-array for row 2, so rows 1 and 3 would still be accessible afterwards.

Upvotes: 0

user3442743
user3442743

Reputation:

Another way without true 2D arrays

awk -F' +\\\| +' '{for(i=1;i<=NF;i++)a[NR,i]=$i}
    END{for(i=1;i<=NR;i++){x=j="";while(a[i,++j])x=x?x","a[i,j]:a[i,j];print x}}' file

Readable

awk -F' +\\\| +' '

    {for(i=1;i<=NF;i++)Array[NR,i]=$i}

    END{
        for(i=1;i<=NR;i++){
            x=j=""
            while(Array[i,++j])
                 x=x?x","Array[i,j]:Array[i,j]
            print x
        }
    }

Using the while instead of a for loop ensures that all fields will be printed even when there are a variable amount.

Output

john,violin,expert level,math grade,95
doe,piano,novice,math grade,100,extra info,variable length
jane,cello,beginner,physics,90
mary,flute,advanced,chemistry,95,college next year

Upvotes: 0

glenn jackman
glenn jackman

Reputation: 247042

gawk -F '[[:blank:]]*\\\|[[:blank:]]*' '
    {for (i=1; i<=NF; i++) data[NR][i] = $i}   # this populates the array, line-by-line
    END {
        # now, we iterate over it
        for (n=1; n<=NR; n++) {
            sep = ""
            for (i=1; i<=length(data[n]); i++) {
                printf "%s%s", sep, data[n][i]
                sep = ","
            }
            print ""
        }
    }
' file
john,violin,expert level,math grade,95
doe,piano,novice,math grade,100,extra info,variable length
jane,cello,beginner,physics,90
mary,flute,advanced,chemistry,95,college next year

Upvotes: 1

Related Questions