Reputation: 1189
I have two sets of text files. First set is in AA folder. Second set is in BB folder. The content of ff.txt file from first set(AA folder) is shown below.
Name number marks
john 1 60
maria 2 54
samuel 3 62
ben 4 63
I would like to print the second column(number) from this file if marks>60. The output would be 3,4. Next, read the ff.txt file in the BB folder and delete the lines containing numbers 3,4.
files in the BB folder looks like this. second column is the number.
marks 1 11.824 24.015 41.220 1.00 13.65
marks 1 13.058 24.521 40.718 1.00 11.82
marks 3 12.120 13.472 46.317 1.00 10.62
marks 4 10.343 24.731 47.771 1.00 8.18
I used the following code.This code is working perfectly for one file.
gawk 'BEGIN {getline} $3>60{print $2}' AA/ff.txt | while read number; do gawk -v number=$number '$2 != number' BB/ff.txt > /tmp/ff.txt; mv /tmp/ff.txt BB/ff.txt; done
But when I run this code with multiple files, I get error.
gawk 'BEGIN {getline} $3>60{print $2}' AA/*.txt | while read number; do gawk -v number=$number '$2 != number' BB/*.txt > /tmp/*.txt; mv /tmp/*.txt BB/*.txt; done
error:-
mv: target `BB/kk.txt' is not a directory
I had asked this question two days ago.Please help me to solve this error.
Upvotes: 2
Views: 989
Reputation: 36282
One perl
solution:
use warnings;
use strict;
use File::Spec;
## Hash to save data to delete from files of BB folder.
## key -> file name.
## value -> string with numbers of second column. They will be
## joined separated with '-...-', like: -2--3--1-. And it will be easier to
## search for them using a regexp.
my %delete;
## Check arguments:
## 1.- They are two.
## 2.- Both are directories.
## 3.- Both have same number of regular files and with identical names.
die qq[Usage: perl $0 <dir_AA> <dir_BB>\n] if
@ARGV != 2 ||
grep { ! -d } @ARGV;
{
my %h;
for ( glob join q[ ], map { qq[$_/*] } @ARGV ) {
next unless -f;
my $file = ( File::Spec->splitpath( $_ ) )[2] or next;
$h{ $file }++;
}
for ( values %h ) {
if ( $_ != 2 ) {
die qq[Different files in both directories\n];
}
}
}
## Get files from dir 'AA'. Process them, print to output lines which
## matches condition and save the information in the %delete hash.
for my $file ( glob( shift . qq[/*] ) ) {
open my $fh, q[<], $file or do { warn qq[Couldn't open file $file\n]; next };
$file = ( File::Spec->splitpath( $file ) )[2] or do {
warn qq[Couldn't get file name from path\n]; next };
while ( <$fh> ) {
next if $. == 1;
chomp;
my @f = split;
next unless @f >= 3;
if ( $f[ $#f ] > 60 ) {
$delete{ $file } .= qq/-$f[1]-/;
printf qq[%s\n], $_;
}
}
}
## Process files found in dir 'BB'. For each line, print it if not found in
## file from dir 'AA'.
{
@ARGV = glob( shift . qq[/*] );
$^I = q[.bak];
while ( <> ) {
## Sanity check. Shouldn't occur.
my $filename = ( File::Spec->splitpath( $ARGV ) )[2];
if ( ! exists $delete{ $filename } ) {
close ARGV;
next;
}
chomp;
my @f = split;
if ( $delete{ $filename } =~ m/-$f[1]-/ ) {
next;
}
printf qq[%s\n], $_;
}
}
exit 0;
A test:
Assuming next tree of files. Command:
ls -R1
Output:
.:
AA
BB
script.pl
./AA:
ff.txt
gg.txt
./BB:
ff.txt
gg.txt
And next content of files. Command:
head AA/*
Output:
==> AA/ff.txt <==
Name number marks
john 1 60
maria 2 54
samuel 3 62
ben 4 63
==> AA/gg.txt <==
Name number marks
john 1 70
maria 2 54
samuel 3 42
ben 4 33
Command:
head BB/*
Output:
==> BB/ff.txt <==
marks 1 11.824 24.015 41.220 1.00 13.65
marks 1 13.058 24.521 40.718 1.00 11.82
marks 3 12.120 13.472 46.317 1.00 10.62
marks 4 10.343 24.731 47.771 1.00 8.18
==> BB/gg.txt <==
marks 1 11.824 24.015 41.220 1.00 13.65
marks 2 13.058 24.521 40.718 1.00 11.82
marks 3 12.120 13.472 46.317 1.00 10.62
marks 4 10.343 24.731 47.771 1.00 8.18
Run the script like:
perl script.pl AA/ BB
With following ouput to screen:
samuel 3 62
ben 4 63
john 1 70
And files of BB
directory modified like:
head BB/*
Output:
==> BB/ff.txt <==
marks 1 11.824 24.015 41.220 1.00 13.65
marks 1 13.058 24.521 40.718 1.00 11.82
==> BB/gg.txt <==
marks 2 13.058 24.521 40.718 1.00 11.82
marks 3 12.120 13.472 46.317 1.00 10.62
marks 4 10.343 24.731 47.771 1.00 8.18
So, from ff.txt
lines with numbers 3
and 4
have been deleted, and lines with number 1
in gg.txt
, which all of them were bigger than 60
in last column. I think this is what you wanted to achieve. I hope it helps, although not awk
.
Upvotes: 0
Reputation: 54592
This creates an index of all files in folder AA
and checks against all files in folder BB
:
cat AA/*.txt | awk 'FNR==NR { if ($3 > 60) array[$2]; next } !($2 in array)' - BB/*.txt
This compares two individual files, assuming they have the same name in folders AA
and BB
:
ls AA/*.txt | sed "s%AA/\(.*\)%awk 'FNR==NR { if (\$3 > 60) array[\$2]; next } !(\$2 in array)' & BB/\1 %" | sh
HTH
EDIT
This should help :-)
ls AA/*.txt | sed "s%AA/\(.*\)%awk 'FNR==NR { if (\$3 > 60) array[\$2]; next } !(\$2 in array)' & BB/\1 > \1_tmp \&\& mv \1_tmp BB/\1 %" | sh
Upvotes: 1
Reputation: 161974
> /tmp/*.txt
and mv /tmp/*.txt BB/*.txt
are wrong.
awk 'NR>1 && $3>60{print $2}' AA/ff.txt > idx.txt
awk 'NR==FNR{a[$0]; next}; !($2 in a)' idx.txt BB/ff.txt
awk 'FNR>1 && $3>60{print $2}' AA/*.txt >idx.txt
cat BB/*.txt | awk 'NR==FNR{a[$0]; next}; !($2 in a)' idx.txt -
Upvotes: 1