Reputation: 13
I have multiple folders Case-1, Case-2....Case-N and they all have a file named PPD. I want to extract all 2nd columns and put them into one file named 123.dat. It seems that I cannot use awk in a for loop.
case=$1
for (( i = 1; i <= $case ; i ++ ))
do
file=Case-$i
cp $file/PPD temp$i.dat
awk 'FNR==1{f++}{a[f,FNR]=$2}
END
{for(x=1;x<=FNR;x++)
{for(y=1;y<ARGC;y++)
printf("%s ",a[y,x]);print ""} }'
temp$i.dat >> 123.dat
done
Now 123.dat only has the date of the last PPD in Case-N
I know I can use join(I used that command before) if every PPD file has at least one column the same, but it turns out to be extremely slow if I have lots of Case folders
Upvotes: 0
Views: 147
Reputation: 1840
The below AWK
program can help you.
#!/usr/bin/awk -f
BEGIN {
# Defaults
nrecord=1
nfiles=0
}
BEGINFILE {
# Check if the input file is accessible,
# if not skip the file and print error.
if (ERRNO != "") {
print("Error: ",FILENAME, ERRNO)
nextfile
}
}
{
# Check if the file is accessed for the first time
# if so then increment nfiles. This is to keep count of
# number of files processed.
if ( FNR == 1 ) {
nfiles++
} else if (FNR > nrecord) {
# Fetching the maximum size of the record processed so far.
nrecord=FNR
}
# Fetch the second column from the file.
array[nfiles,FNR]=$2
}
END {
# Iterate through the array and print the records.
for (i=1; i<=nrecord; i++) {
for (j=1; j<=nfiles; j++) {
printf("%5s", array[j,i])
}
print ""
}
}
Output:
$ ./get.awk Case-*/PPD
1 11 21
2 12 22
3 13 23
4 14 24
5 15 25
6 16 26
7 17 27
8 18 28
9 19 29
10 20 30
Here the Case*/PPD
expands to Case-1/PPD
, Case-2/PPD
, Case-3/PPD
and so on. Below are the source files for which the output was generated.
$ cat Case-1/PPD
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
9 9 9 9
10 10 10 10
$ cat Case-2/PPD
11 11 11 11
12 12 12 12
13 13 13 13
14 14 14 14
15 15 15 15
16 16 16 16
17 17 17 17
18 18 18 18
19 19 19 19
20 20 20 20
$ cat Case-3/PPD
21 21 21 21
22 22 22 22
23 23 23 23
24 24 24 24
25 25 25 25
26 26 26 26
27 27 27 27
28 28 28 28
29 29 29 29
30 30 30 30
Upvotes: 2
Reputation: 4353
The interaction between the outer shell script and inner awk
invocation aren't working the way you expect.
Every time through the loop, the shell script calls awk
a new time, which means that f
will be unset, and then that first clause will set it to 1
. It will never become 2
. That is, you are starting a new awk
process for each iteration through the outer loop, and awk
is starting from scratch each time.
There are other ways to structure your code, but as a minimal tweak, you can pass in the number $i
to the awk
invocation using the -v
option, e.g. awk -v i="$i" ...
.
Note that there are better ways to structure your overall solution, as other answerers have already suggested; I meant this response to be an answer the question, "Why doesn't this work?" and not "Please rewrite this code."
Upvotes: 2
Reputation: 189357
Maybe
eval paste $(printf ' <(cut -f2 %s)' Case-*/PPD)
There is probably a limit to how many process substitutions you can perform in one go. I did this with 20 columns and it was fine. Process substitutions are a Bash feature, so not portable to other Bourne-compatible shells in general.
The wildcard will be expanded in alphabetical order. If you want the cases in numerical order, maybe use case-[1-9] case-[1-9][0-9] case-[1-9][0-9][0-9]
to force the expansion to get the single digits first, then the double digits, etc.
Upvotes: 2