Grego
Grego

Reputation: 59

Deleting multiple filenames from files in a text file with sed

I'm trying to delete every line that contains the basename from all files listed in a folder.

I have 2 000 000 files in a folder but there should be 2 500 000 files. I have a missing.txt file that contains all the 2.5M filenames line by line. I want to delete all lines that I already have to restart my process and finish the 500 000 missing files.

My very simple script is :

for FILE in ../pdb/*; do
BNAME="$(basename ${FILE} _mini.pdb)"
sed "/${BNAME}/d" ./missing.txt
done

The problem is that sed doesn't delete the $BNAME lines in the missing.txt file. What am I doing wrong with sed?

Upvotes: 0

Views: 51

Answers (1)

mauro
mauro

Reputation: 5950

I would use a different approach:

First create a sorted list of the current files: ls | sort > new_list.txt. It should contain ~2ml rows.

Then sort the list of 2.5ml files sort missing.txt > old_list.txt

Finally extract the difference: comm -23 old_list new_list

It's much more efficient than 2ML instances of sed.

Upvotes: 1

Related Questions