Reputation: 145
I have two lists list1
and list2
with a filename on each line. I want a result
with all filenames that are only in list2
and not in list1
, regardless of specific file extensions (but not all). Using Linux bash, any commands that do not require any extra installations. In the example lists, I do know all file extensions that I wish to ignore. I made an attempt but it does not work at all, I don't know how to fix it. Apologies for my inexperience.
I wish to ignore the following extensions: .x .xy .yx .y .jpg
list1.txt
text.x
example.xy
file.yx
data.y
edit
edit.jpg
list2.txt
text
rainbow.z
file
data.y
sunshine
edit.test.jpg
edit.random
result.txt
rainbow.z
sunshine
edit.test.jpg
edit.random
My try:
while read LINE
do
line2=$LINE
sed -i 's/\.x$//g' $LINE $line2
sed -i 's/\.xy$//g' $LINE $line2
sed -i 's/\.yx$//g' $LINE $line2
sed -i 's/\.y$//g' $LINE $line2
then sed -i -e '$line' result.txt;
fi
done < list2.txt
Edit: I forgot two requirements. The filenames can have . in them and not all filenames must have an extension. I know the extensions that must be ignored. I ammended the lists accordingly.
Upvotes: 1
Views: 147
Reputation: 10133
An awk
solution might be more efficient for this task:
awk '
{ f=$0; sub(/\.(xy?|yx?|jpg)$/,"",f) }
NR==FNR { a[f]; next }
!(f in a)
' list1.txt list2.txt > result.txt
Upvotes: 3
Reputation: 16868
comm
can do precisely this.
You can preprocess the input:
comm
expects sorted input)ss()( sed 's/\.\(x\|xy\|yx\|y\|jpg\)$//' "$@" | sort -u )
comm -13 <(ss list1.txt) <(ss list2.txt) >result.txt
Your code was:
while read LINE
do
line2=$LINE
sed -i 's/\.x$//g' $LINE $line2
sed -i 's/\.xy$//g' $LINE $line2
sed -i 's/\.yx$//g' $LINE $line2
sed -i 's/\.y$//g' $LINE $line2
then sed -i -e '$line' result.txt;
fi
done < list2.txt
Some issues that immediately jump out:
then
/fi
but no matching if
list1
while read ... sed ... sed ... sed ...
is inefficient - multiple invocations of sed instead of just one, and a loop that sed would perform implicitlysed
expects file arguments not stringssed -i
will try to overwrite input file argumentsresult.txt
as both input and output to sed but never assign any contents to it$line
) as sed commands, instead of applying sed commands to that datased -i -e '$line'
will attempt to run a (non-existent) sed command line
on the last line of input ($
)g
option to s///
does nothing when search is anchoredUpvotes: 3
Reputation: 52644
I'd use join
:
$ join -t. -j1 -v2 -o 2.1,2.2 <(sort list1.txt) <(sort list2.txt) | sed 's/\.$//'
rainbow.z
sunshine
(The bit of sed
is needed to turn sunshine.
into sunshine
)
Upvotes: 2