darwin
darwin

Reputation: 25

deleting lines from text files based on the last character which are in another file using awk or sed

I have a file, xx.txt, like this.

 1PPYA
 2PPYB
 1GBND
 1CVHA

The first line of this file is "1PPYA". I would like to

  1. Read the last character of "1PPYA." In this example, it's "A/"
  2. Find "1PPY.txt" (the first four characters) from the "yy" directory.
  3. Delete the lines start with "csh" which contain the "A" character.

Given the following "1PPY.txt" in the "yy" directory:

 csh    1      A   1      27.704   6.347   
 csh    2      A   1      28.832   5.553  
 csh    3      A   1      28.324   4.589 
 csh    4      B   1      27.506   3.695  
 csh    5      C   1      29.411   4.842 
 csh    6      A   1      28.378   4.899  

The required output would be:

csh  4      B   1      27.506   3.695
csh  5      C   1      29.411   4.842 

Upvotes: 0

Views: 274

Answers (4)

Kaz
Kaz

Reputation: 58666

TXR:

@(next "xx.txt")
@(collect)
@*prefix@{suffix /./}
@  (next `yy/@prefix.txt`)
@  (collect)
@    (all)
@{whole-line}
@    (and)
@      (none)
@shell @num @suffix @(skip)
@      (end)
@    (end)
@  (do (put-string whole-line) (put-string "\n"))
@  (end)
@(end)

Run:

$ txr del.txr
csh    4      B   1      27.506   3.695  
csh    5      C   1      29.411   4.842 
txr: unhandled exception of type file_error:
txr: (del.txr:5) could not open yy/2PPY.txt (error 2/No such file or directory)

Because of the outer @(collect)/@(end) (easily removed) this processes all of the lines from xx.txt, not just the first line, and so it blows up because I don't have 2PPY.txt.

Upvotes: 0

glenn jackman
glenn jackman

Reputation: 247230

Assuming your shell is bash

while read word; do
    if [[ $word =~ ^(....)(.)$ ]]; then
        filename="yy/${BASH_REMATCH[1]}.txt"
        letter=${BASH_REMATCH[2]} 
        if [[ -f "$filename" ]]; then
            sed "/^csh.*$letter/d" "$filename"
        fi
    fi
done < xx.txt

As you've tagged the question with awk:

awk '{
    filename = "yy/" substr($1,1,4) ".txt"
    letter = substr($1,5)
    while (getline < filename) 
        if (! match($0, "^csh.*" letter)) 
            print
    close(filename)
}' xx.txt

Upvotes: 1

potong
potong

Reputation: 58578

This might work for you:

 sed 's|^ *\(.*\)\(.\)$|sed -i.bak "/^ *csh.*\2/d" yy/\1.txt|' xx.txt | sh

N.B. I added a file backup. If this is not needed amend the -i.bak to -i

Upvotes: 0

anubhava
anubhava

Reputation: 786349

You can use this bash script:

while read f l
do
   [[ -f $f ]] && awk -v l=$l '$3 != l' $f
done < <(awk '{len=length($0);l=substr($0,len);f=substr($0,0,len-1);print "yy/" f ".txt", l;}' xx.txt)

I posted this because you are a new user, however it will be much better to show us what you have tried and where you're stuck.

Upvotes: 0

Related Questions