Reputation: 59
I'm trying to delete every line that contains the basename from all files listed in a folder.
I have 2 000 000 files in a folder but there should be 2 500 000 files. I have a missing.txt file that contains all the 2.5M filenames line by line. I want to delete all lines that I already have to restart my process and finish the 500 000 missing files.
My very simple script is :
for FILE in ../pdb/*; do
BNAME="$(basename ${FILE} _mini.pdb)"
sed "/${BNAME}/d" ./missing.txt
done
The problem is that sed doesn't delete the $BNAME lines in the missing.txt file. What am I doing wrong with sed?
Upvotes: 0
Views: 51
Reputation: 5950
I would use a different approach:
First create a sorted list of the current files: ls | sort > new_list.txt
. It should contain ~2ml rows.
Then sort the list of 2.5ml files sort missing.txt > old_list.txt
Finally extract the difference: comm -23 old_list new_list
It's much more efficient than 2ML instances of sed.
Upvotes: 1