Reputation: 157
Suppose in a directory I have 3 files, File 1, File 2 & File 3. with same header name Is it possible in awk to compare & write the frequency of occurrence
File 1
C1 C2 C3 C4
a d a d
a d a d
a d a d
File 2
C1 C2 C3 C4
a d a d
a v a d
a d a d
File 3
C1 C2 C3 C4
a d r d
a f a d
a d a d
Step 1 compare File 1 & File 2
Temp.output
C1 C2 C3 C4
0 0 0 0
0 1 0 0
0 0 0 0
Then the compare File 2 & File 3 & overwrite Temp.output with the frequency
Final.Output
C1 C2 C3 C4
0 0 1 0
0 2 0 0
0 0 0 0
the original directory may contain multiple files, and i want each of them process in orderly manner, ie. File1.txt with file2.txt then file2.txt with file3.txt
Upvotes: 0
Views: 130
Reputation: 829
Let me suggest you to convert your input files into lines. With this, you can apply awk
easily.
The paste -s <file>
command is your ally. Below you can see how sort your files sorted and convert them to lines:
$ cat File1.txt
C1 C2 C3 C4
a d a d
a d a d
a d a d
$ ls
File1.txt File2.txt File3.txt
$ ls | sort
File1.txt
File2.txt
File3.txt
$ ls | sort | xargs -L 1 -I {} /bin/bash -c 'echo -n {}" "; paste -s {}'
File1.txt C1 C2 C3 C4 a d a d a d a d a d a d
File2.txt C1 C2 C3 C4 a d a d a v a d a d a d
File3.txt C1 C2 C3 C4 a d r d a f a d a d a d
$
Once into lines, you can use awk to iterate the fields (NF
will tell you how many are there). We will use several rules.
For every line, you will compare if the field at i
is different from the previous saved value and increment the result accordingly. Skip comparing the results for the first line with the (NR != 1)
selector.
(NR != 1) { for (i = 1; i <= NF; i++) { if (last[i] != $i) { result[i]++; } } }
In the same awk
call, include the rule that updates the array where you keep the last values:
{ for (i = 1; i <= NF; i++) { last[i] = $i } }
Finally printout the file and the status of the results:
{ printf("%s", $1); for (i = 1; i <= NF; i++) { printf(" %d", result[i]) } print "" }
Here you is the whole command:
$ ls | sort | xargs -L 1 -I {} /bin/bash -c 'echo -n {}" "; paste -s {}' | awk '(NR != 1) { for (i = 1; i <= NF; i++) { if (last[i] != $i) { result[i]++; } } } { for (i = 1; i <= NF; i++) { last[i] = $i } } { printf("%s", $1); for (i = 1; i <= NF; i++) { printf(" %d", result[i]) } print "" }'
File1.txt 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
File2.txt 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
File3.txt 2 0 0 0 0 0 0 1 0 0 2 0 0 0 0 0 0
$
This output starts with the filename, then the accumulated differences in:
You can format that back again with awk
inserting new lines when appropriate:
awk '{ print ""; printf("%s", $1); for (i = 7; i <= NF; i++) { if (((i - 7) % 4) == 0) print "" ; printf(" %d", $i) } print "" }'
Here you have a complete run:
$ ls | sort | xargs -L 1 -I {} /bin/bash -c 'echo -n {}" "; paste -s {}' | awk '(NR != 1) { for (i = 1; i <= NF; i++) { if (last[i] != $i) { result[i]++; } } } { for (i = 1; i <= NF; i++) { last[i] = $i } } { printf("%s", $1); for (i = 1; i <= NF; i++) { printf(" %d", result[i]) } print "" }' | awk '{ print ""; printf("%s", $1); for (i = 7; i <= NF; i++) { if (((i - 7) % 4) == 0) print "" ; printf(" %d", $i) } print "" }'
File1.txt
0 0 0 0
0 0 0 0
0 0 0 0
File2.txt
0 0 0 0
0 1 0 0
0 0 0 0
File3.txt
0 0 1 0
0 2 0 0
0 0 0 0
$
Upvotes: 1
Reputation: 283
Please find the awk script below. row = 4 includes header as well
#!/bin/bash
/usr/bin/awk '{print $0;}' /tmp/file* | awk -v row=4 -v col=4 '
{
x = (NR - 1)%row;
for(i = 1; i <= NF; i++){
if(a[x, i] != $i){
a[x, i] = $i;
count[x, i]++;
}
}
}END{
for(i = 1; i <= row-1; i++){
for(j = 1; j <= col; j++){
printf (count[i, j]-1)" ";
}
printf "\n";
}
}'
#
Below script is to print each iterations
#!/bin/bash
/usr/bin/awk '{print $0;}' /tmp/stack/file* | awk -v row=4 -v col=4 '
{
x = (NR - 1)%row;
for(i = 1; i <= NF; i++){
if(a[x, i] != $i){
a[x, i] = $i;
count[x, i]++;
}
}
for(i = 1; i <= row-1; i++){
for(j = 1; j <= col; j++){
printf (count[i, j]-1)" ";
}
printf "\n";
}
print "***********";
}END{
for(i = 1; i <= row-1; i++){
for(j = 1; j <= col; j++){
printf (count[i, j]-1)" ";
}
printf "\n";
}
}'
Upvotes: 0