Reputation: 1189

delete lines from multiple files using gawk / awk / sed

I have two sets of text files. First set is in AA folder. Second set is in BB folder. The content of ff.txt file from first set(AA folder) is shown below.

Name        number     marks
john            1         60
maria           2         54
samuel          3         62
ben             4         63

I would like to print the second column(number) from this file if marks>60. The output would be 3,4. Next, read the ff.txt file in the BB folder and delete the lines containing numbers 3,4.

files in the BB folder looks like this. second column is the number.

 marks       1      11.824  24.015  41.220  1.00 13.65
 marks       1      13.058  24.521  40.718  1.00 11.82
 marks       3      12.120  13.472  46.317  1.00 10.62
 marks       4      10.343  24.731  47.771  1.00  8.18

I used the following code.This code is working perfectly for one file.

gawk 'BEGIN {getline} $3>60{print $2}' AA/ff.txt | while read number; do gawk -v number=$number '$2 != number' BB/ff.txt > /tmp/ff.txt; mv /tmp/ff.txt BB/ff.txt; done

But when I run this code with multiple files, I get error.

gawk 'BEGIN {getline} $3>60{print $2}' AA/*.txt | while read number; do gawk -v number=$number '$2 != number' BB/*.txt > /tmp/*.txt; mv /tmp/*.txt BB/*.txt; done

error:-
mv: target `BB/kk.txt' is not a directory

I had asked this question two days ago.Please help me to solve this error.

Upvotes: 2

Answers (3)

Birei

Reputation: 36282

One perl solution:

use warnings;
use strict;
use File::Spec;

## Hash to save data to delete from files of BB folder.
## key -> file name.
## value -> string with numbers of second column. They will be
## joined separated with '-...-', like: -2--3--1-. And it will be easier to
## search for them using a regexp.
my %delete;

## Check arguments:
## 1.- They are two.
## 2.- Both are directories.
## 3.- Both have same number of regular files and with identical names.
die qq[Usage: perl $0 <dir_AA> <dir_BB>\n] if
        @ARGV != 2 ||
        grep { ! -d } @ARGV;

{
        my %h;
        for ( glob join q[ ], map { qq[$_/*] } @ARGV ) {
                next unless -f;
                my $file = ( File::Spec->splitpath( $_ ) )[2] or next;
                $h{ $file }++;
        }

        for ( values %h ) {
                if ( $_ != 2 ) {
                        die qq[Different files in both directories\n];
                }
        }
}

## Get files from dir 'AA'. Process them, print to output lines which 
## matches condition and save the information in the %delete hash.
for my $file ( glob( shift . qq[/*] ) ) {
        open my $fh, q[<], $file or do { warn qq[Couldn't open file $file\n]; next };
        $file = ( File::Spec->splitpath( $file ) )[2] or do { 
                warn qq[Couldn't get file name from path\n]; next };
        while ( <$fh> ) {
                next if $. == 1;
                chomp;
                my @f = split;
                next unless @f >= 3;
                if ( $f[ $#f ] > 60 ) {
                        $delete{ $file } .= qq/-$f[1]-/;
                        printf qq[%s\n], $_;
                }
        }
}

## Process files found in dir 'BB'. For each line, print it if not found in
## file from dir 'AA'.
{
        @ARGV  = glob( shift . qq[/*] );
        $^I = q[.bak];
        while ( <> ) {

                ## Sanity check. Shouldn't occur.
                my $filename = ( File::Spec->splitpath( $ARGV ) )[2];
                if ( ! exists $delete{ $filename } ) {
                        close ARGV;
                        next;
                }

                chomp;
                my @f = split;
                if ( $delete{ $filename } =~ m/-$f[1]-/ ) {
                        next;
                }

                printf qq[%s\n], $_;
        }
}

exit 0;

A test:

Assuming next tree of files. Command:

ls -R1

Output:

.:
AA
BB
script.pl

./AA:
ff.txt
gg.txt

./BB:
ff.txt
gg.txt

And next content of files. Command:

head AA/*

Output:

==> AA/ff.txt <==
Name        number     marks
john            1         60
maria           2         54
samuel          3         62
ben             4         63
==> AA/gg.txt <==
Name        number     marks
john            1         70
maria           2         54
samuel          3         42
ben             4         33

Command:

head BB/*

Output:

==> BB/ff.txt <==
 marks       1      11.824  24.015  41.220  1.00 13.65
 marks       1      13.058  24.521  40.718  1.00 11.82
 marks       3      12.120  13.472  46.317  1.00 10.62
 marks       4      10.343  24.731  47.771  1.00  8.18
==> BB/gg.txt <==
 marks       1      11.824  24.015  41.220  1.00 13.65
 marks       2      13.058  24.521  40.718  1.00 11.82
 marks       3      12.120  13.472  46.317  1.00 10.62
 marks       4      10.343  24.731  47.771  1.00  8.18

Run the script like:

perl script.pl AA/ BB

With following ouput to screen:

samuel          3         62
ben             4         63
john            1         70

And files of BB directory modified like:

head BB/*

Output:

==> BB/ff.txt <==
 marks       1      11.824  24.015  41.220  1.00 13.65
 marks       1      13.058  24.521  40.718  1.00 11.82

==> BB/gg.txt <==
 marks       2      13.058  24.521  40.718  1.00 11.82
 marks       3      12.120  13.472  46.317  1.00 10.62
 marks       4      10.343  24.731  47.771  1.00  8.18

So, from ff.txt lines with numbers 3 and 4 have been deleted, and lines with number 1 in gg.txt, which all of them were bigger than 60 in last column. I think this is what you wanted to achieve. I hope it helps, although not awk.

Upvotes: 0

Steve

Reputation: 54592

This creates an index of all files in folder AA and checks against all files in folder BB:

cat AA/*.txt | awk 'FNR==NR { if ($3 > 60) array[$2]; next } !($2 in array)' - BB/*.txt

This compares two individual files, assuming they have the same name in folders AA and BB:

ls AA/*.txt | sed "s%AA/$.*$%awk 'FNR==NR { if (\$3 > 60) array[\$2]; next } !(\$2 in array)' & BB/\1 %" | sh

HTH

EDIT

This should help :-)

ls AA/*.txt | sed "s%AA/$.*$%awk 'FNR==NR { if (\$3 > 60) array[\$2]; next } !(\$2 in array)' & BB/\1 > \1_tmp \&\& mv \1_tmp BB/\1 %" | sh

Upvotes: 1

kev

Reputation: 161974

> /tmp/*.txt and mv /tmp/*.txt BB/*.txt are wrong.

For single file

awk 'NR>1 && $3>60{print $2}' AA/ff.txt > idx.txt

awk 'NR==FNR{a[$0]; next}; !($2 in a)' idx.txt BB/ff.txt

For multiple files

awk 'FNR>1 && $3>60{print $2}' AA/*.txt >idx.txt

cat BB/*.txt | awk 'NR==FNR{a[$0]; next}; !($2 in a)' idx.txt -

Upvotes: 1

delete lines from multiple files using gawk / awk / sed

Answers (3)

For single file

For multiple files

Related Questions