5r9n
5r9n

Reputation: 187

I want to remove everything contained in one list from another list along with the next line from another file

I want to remove everything contained in one list from another list along with the next line. For instance: list2.txt contains:

A
D

list1a.txt contains:

>A
 AAAAA
>B
 GGGGG
>C
 CCCC
>D
 TTTT

I expect the following output:

>B
GGGGG
>C
CCCC

Where >A and >D have been removed along with the lines below them.

I have tried:

input=$1
file_to_edit=$2
while IFS= read -r var
do
echo $var
sed "s/$var//g" $file_to_edit >f2.txt
done < "$input"

f2.txt returns:

>A
AAAAA
>B
GGGGG
>C
CCCC
>
TTTT

As expected it returns the "D" removed, but not the A, and not the line below them. I need to remove any line contained in the first list from the second file, as well as the line below it.

Upvotes: 1

Views: 139

Answers (3)

Artur Paszek
Artur Paszek

Reputation: 11

cp 2.txt /tmp/temp1
while read var
do
echo $var
sed 's,'"$var"',,g' /tmp/temp1 > /tmp/temp2
mv -f /tmp/temp2 /tmp/temp1
done < 1.txt
cp /tmp/temp1 3.txt

or

while read var
do
echo "s/"$var"//g" >> sed.script2
done < 1.txt
sed --file=sed.script2 2.txt > 3.txt
rm -f sed.script

Upvotes: 0

Sundeep
Sundeep

Reputation: 23667

With GNU sed

$ sed 's|.*|/^>&$/,+1d|' f1
/^>A$/,+1d
/^>D$/,+1d
$ sed -f <(sed 's|.*|/^>&$/,+1d|' f1) f2
>B
 GGGGG
>C
 CCCC
  • create command for each line, +n means n lines after the matching line
  • then use those commands to act upon the other file


If GNU sed is not available, try

$ sed -f <(sed 's|.*|/^>&$/{N;d;}|' f1) f2
>B
 GGGGG
>C
 CCCC
  • here N command is used to add next line to pattern space. Then they are deleted. For two more lines, use N;N for three use N;N;N and so on


With awk and getline

$ awk 'NR==FNR{a[">"$0]; next} ($0 in a) && (getline x)>0{next} 1' f1 f2
>B
 GGGGG
>C
 CCCC

If there will be only one match found in f2

awk 'NR==FNR{a[">"$0];next} ($0 in a) && (getline x)>0{delete a[$0];next} 1' f1 f2

Upvotes: 2

RavinderSingh13
RavinderSingh13

Reputation: 133528

Following awk may help you also in same.

awk 'FNR==NR{a[$0]=$0;next} /^>/{c=$0;sub(/^>/,"",c)} (c in a){getline;next} 1' list2.txt list1a.txt

Output will be as follows.

>B
 GGGGG
>C
 CCCC

Upvotes: 1

Related Questions