Reputation: 5749
I am trying to store a whole file in a multidimensional array (in bash shell in Ubuntu), but hove not find an elegant way to do it. Can you help?
I have this file with "|" as the field separator:
john | violin | expert level | math grade | 95
doe | piano | novice | math grade | 100 | extra info | variable length
jane | cello | beginner | physics | 90
mary | flute | advanced | chemistry | 95 | college next year
What I want to do is to store all the fields in a multidimensional array:
awk 'BEGIN { x = 0;
while ((getline oneLine < "studentFile") > 0) {
theFile[x] = oneLine;
++x;
}
close("studentFile");
}
{ for (y in theFile) print theFile[y]; }' studentFile <----- if I don't put a file here, the command won't run
But this is only one dimensional; how do I store the varying length lines in 2-D array?
I have also tried:
awk 'BEGIN { x = 0;
while ((getline oneLine < "studentFile") > 0) {
theFile[x] = split(oneLine, arr, "|");
++x;
}
close("studentFile");
}
{ for (y in theFile) {
for (z in theFile[y]) {
print theFile[y][z];
}
}
}' studentFile <----- if I don't put a file here, the command won't run
But it says: "awk: cmd. line:9: (FILENAME=studentFile FNR=1) fatal: attempt to use a scalar value as array"
Also I tried the fix the error:
split(theFile[y], newArray, "|");
for (z in newArray) {
print newArray[z];
}
but it only printed the indices. Now I am out of ideas. Please help!
Thank you very very much !!!
Upvotes: 0
Views: 1128
Reputation: 2875
If you're willing to use gawk
, you can directly split individual rows into sub-arrays without looping through the fields :
gawk -be '{
split($_, __[NR])
} END {
for (___ in __) {
printf(" row %4s : {%.0s",
___, ____ = _<_)
for(_ in __[___]
printf("%.*s %s",____ || ____++, ",",
__[___][_])
printf(" }\n")
} }'
1 17 13 19 25 31
2 3915 2127 33
3 5 1117 23 29 35
row 1 : { 17, 13, 19, 25, 31 }
row 2 : { 3915, 2127, 33 }
row 3 : { 5, 1117, 23, 29, 35 }
After each row split()
you can also add something like __[NR][+_] = $_
if you want the original row's untampered formatting, keeping in mind that for(_ in __)
would also include that full row you've added, since it's an iterator for all indices
(strangely enough , the empty string array index __[""]
is perfectly valid in awk
)
delete __[2]
actually only deletes out the sub-array for row 2, so rows 1 and 3 would still be accessible afterwards.
Upvotes: 0
Reputation:
Another way without true 2D arrays
awk -F' +\\\| +' '{for(i=1;i<=NF;i++)a[NR,i]=$i}
END{for(i=1;i<=NR;i++){x=j="";while(a[i,++j])x=x?x","a[i,j]:a[i,j];print x}}' file
Readable
awk -F' +\\\| +' '
{for(i=1;i<=NF;i++)Array[NR,i]=$i}
END{
for(i=1;i<=NR;i++){
x=j=""
while(Array[i,++j])
x=x?x","Array[i,j]:Array[i,j]
print x
}
}
Using the while
instead of a for
loop ensures that all fields will be printed even when there are a variable amount.
john,violin,expert level,math grade,95
doe,piano,novice,math grade,100,extra info,variable length
jane,cello,beginner,physics,90
mary,flute,advanced,chemistry,95,college next year
Upvotes: 0
Reputation: 247042
gawk -F '[[:blank:]]*\\\|[[:blank:]]*' '
{for (i=1; i<=NF; i++) data[NR][i] = $i} # this populates the array, line-by-line
END {
# now, we iterate over it
for (n=1; n<=NR; n++) {
sep = ""
for (i=1; i<=length(data[n]); i++) {
printf "%s%s", sep, data[n][i]
sep = ","
}
print ""
}
}
' file
john,violin,expert level,math grade,95
doe,piano,novice,math grade,100,extra info,variable length
jane,cello,beginner,physics,90
mary,flute,advanced,chemistry,95,college next year
Upvotes: 1