Cocoa Puffs
Cocoa Puffs

Reputation: 774

Regex grep file contents and invoke command

I have a file that has been generated containing MD5 info along with filenames. I'm wanting to remove the files from the directory they are in. I'm not sure how to go about doing this exactly.

filelist (file) contains:

MD5 (dupe) = 1fb218dfef4c39b4c8fe740f882f351a
MD5 (somefile) = a5c6df9fad5dc4299f6e34e641396d38

my command (which i would like to include with rm) looks like this:

grep -o "\((.*)\)" filelist

returns this:

(dupe)
(somefile)

*almost good, although the parentheses need to be eliminated (not sure how). I tried using grep -Po "(?<=\().*(?=\))" filelist using a lookahead/lookaround, but the command didn't work.

The next thing I would like to do is take the output filenames and delete them from the directory they are in. I'm not sure how to script it, but it would essentially do:

<returned results from grep>
rm dupe $target
rm somefile $target

Upvotes: 0

Views: 157

Answers (2)

Paul Wheeler
Paul Wheeler

Reputation: 20180

The tool you're looking for is xargs: http://unixhelp.ed.ac.uk/CGI/man-cgi?xargs It's pretty standard on *nix systems.

UPDATE: Given that target equals the directory where the files live...

I believe the syntax would look something like:

yourgrepcmd | xargs -I{} rm "$target{}"

The -I creates a placeholder string, and each line from your grep command gets inserted there.

UPDATE:

The step you need to remove the parens is a little use of sed's substitution command (http://unixhelp.ed.ac.uk/CGI/man-cgi?sed)

Something like this:

cat filelist | sed "s/MD5 (\([^)]*\)) .*$/\1/" | xargs -I{} rm "$target/{}"

The moral of the story here is, if you learn to utilize sed and xargs (or awk if you want something a little more advanced) you'll be a more capable linux user.

Upvotes: 1

grebneke
grebneke

Reputation: 4494

If I understand correctly, you want to take lines like these

MD5 (dupe) = 1fb218dfef4c39b4c8fe740f882f351a
MD5 (somefile) = a5c6df9fad5dc4299f6e34e641396d38

extract the second column without the parentheses to get the filenames

dupe
somefile

and then delete the files?

Assuming the filenames don't have spaces, try this:

# this is where your duplicate files are.
dupe_directory='/some/path'

# Check that you found the right files:
awk '{print $2}' file-with-md5-lines.txt | tr -d '()' | xargs -I{} ls -l "$dupe_directory/{}"

# Looks ok, delete:
awk '{print $2}' file-with-md5-lines.txt | tr -d '()' | xargs -I{} rm -v "$dupe_directory/{}"

xargs -I{} means to replace the argument (dupe filename) with {} so it can be used in a more complex command.

Upvotes: 1

Related Questions