Reputation: 187
I want to remove everything contained in one list from another list along with the next line. For instance: list2.txt contains:
A
D
list1a.txt contains:
>A
AAAAA
>B
GGGGG
>C
CCCC
>D
TTTT
I expect the following output:
>B
GGGGG
>C
CCCC
Where >A and >D have been removed along with the lines below them.
I have tried:
input=$1
file_to_edit=$2
while IFS= read -r var
do
echo $var
sed "s/$var//g" $file_to_edit >f2.txt
done < "$input"
f2.txt returns:
>A
AAAAA
>B
GGGGG
>C
CCCC
>
TTTT
As expected it returns the "D" removed, but not the A, and not the line below them. I need to remove any line contained in the first list from the second file, as well as the line below it.
Upvotes: 1
Views: 139
Reputation: 11
cp 2.txt /tmp/temp1
while read var
do
echo $var
sed 's,'"$var"',,g' /tmp/temp1 > /tmp/temp2
mv -f /tmp/temp2 /tmp/temp1
done < 1.txt
cp /tmp/temp1 3.txt
or
while read var
do
echo "s/"$var"//g" >> sed.script2
done < 1.txt
sed --file=sed.script2 2.txt > 3.txt
rm -f sed.script
Upvotes: 0
Reputation: 23667
With GNU sed
$ sed 's|.*|/^>&$/,+1d|' f1
/^>A$/,+1d
/^>D$/,+1d
$ sed -f <(sed 's|.*|/^>&$/,+1d|' f1) f2
>B
GGGGG
>C
CCCC
+n
means n lines after the matching line
If GNU sed
is not available, try
$ sed -f <(sed 's|.*|/^>&$/{N;d;}|' f1) f2
>B
GGGGG
>C
CCCC
N
command is used to add next line to pattern space. Then they are deleted. For two more lines, use N;N
for three use N;N;N
and so on
With awk
and getline
$ awk 'NR==FNR{a[">"$0]; next} ($0 in a) && (getline x)>0{next} 1' f1 f2
>B
GGGGG
>C
CCCC
If there will be only one match found in f2
awk 'NR==FNR{a[">"$0];next} ($0 in a) && (getline x)>0{delete a[$0];next} 1' f1 f2
Upvotes: 2
Reputation: 133528
Following awk
may help you also in same.
awk 'FNR==NR{a[$0]=$0;next} /^>/{c=$0;sub(/^>/,"",c)} (c in a){getline;next} 1' list2.txt list1a.txt
Output will be as follows.
>B
GGGGG
>C
CCCC
Upvotes: 1