Reputation: 33
I am trying to automate the periodic detection and elimination of files, using fdupes. I got this beautiful script:
# from here:
# https://www.techrepublic.com/blog/linux-and-open-source/how-to-remove-duplicate-files-without-wasting-time/
OUTF=rem-duplicates_2019-01.sh;
echo "#! /bin/sh" > $OUTF;
find "$@" -type f -printf "%s\n" | sort -n | uniq -d |
xargs -I@@ -n1 find "$@" -type f -size @@c -exec md5sum {} \; |
sort --key=1,32 | uniq -w 32 -d --all-repeated=separate |
sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm \1/;' >> $OUTF;
chmod a+x $OUTF; ls -l $OUTF
This produces a file with this structure:
#! /bin/sh
#rm ./directory_a/file_a
#rm ./directory_b/file_identical_to_a
#rm ./directory_a/file_b
#rm ./directory_b/file_identical_to_b
#rm ./directory_c/another_file_identical_to_b
#rm ./directory_a/file_c
#rm ./directory_b/file_identical_to_c
#rm ./directory_c/another_file_identical_to_c
#rm ./directory_d/yet_another_file_identical_to_c
I want to remove the # tag from the first line of each paragraph to get
rm ./directory_a/file_c
#rm ./directory_b/file_identical_to_c
#rm ./directory_c/another_file_identical_to_c
#rm ./directory_d/yet_another_file_identical_to_c
I have been trying to modify the next-to-last line, with variations of things like this:
sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm \1/;s/\n\n#rm/\n\nrm/;' >> $OUTF;
But cannot manage SED to recognize the (\n\n) or any other pointer I can think of to the beginning of the paragraph. What am I doing wrong?
Edit: I am unable to edit the comment, so here is the final script:
TEMPF=temp.txt;
OUTF=rem-duplic_2019-01.sh
echo "#! /bin/sh" > $TEMPF;
find "$@" -type f -printf "%s\n" | sort -n | uniq -d |
xargs -I@@ -n1 find "$@" -type f -size @@c -exec md5sum {} \; |
sort --key=1,32 | uniq -w 32 -d --all-repeated=separate |
sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm \1/' >> $TEMPF;
awk -v a=2 '/^$/{a=2}!--a{sub(/#/,"")}1' $TEMPF > $OUTF
chmod a+x $OUTF; ls -l $OUTF
rm $TEMPF
Upvotes: 1
Views: 393
Reputation: 8711
Just use Perl with paragraph mode
perl -00 -pe ' s/^#// '
With inputs
$ cat yozzarian.txt
#! /bin/sh
#rm ./directory_a/file_a
#rm ./directory_b/file_identical_to_a
#rm ./directory_a/file_b
#rm ./directory_b/file_identical_to_b
#rm ./directory_c/another_file_identical_to_b
#rm ./directory_a/file_c
#rm ./directory_b/file_identical_to_c
#rm ./directory_c/another_file_identical_to_c
#rm ./directory_d/yet_another_file_identical_to_c
$ perl -00 -pe ' s/^#// ' yozzarian.txt
! /bin/sh
#rm ./directory_a/file_a
#rm ./directory_b/file_identical_to_a
rm ./directory_a/file_b
#rm ./directory_b/file_identical_to_b
#rm ./directory_c/another_file_identical_to_b
rm ./directory_a/file_c
#rm ./directory_b/file_identical_to_c
#rm ./directory_c/another_file_identical_to_c
#rm ./directory_d/yet_another_file_identical_to_c
$
Upvotes: 0
Reputation: 58473
This might work for you (GNU sed):
sed '/^#!\|^\s*$/{n;s/.//}' file
If the current line is a shebang or an empty line, print it and remove the first character of the next line.
Upvotes: 0
Reputation: 50785
Use awk instead:
awk '/^$/{a=1} !a--{sub(/#/,"")} 1' a=1 file
/^$/ { a = 1 }
means set a
to 1 if current line is a blank one,!a--
is a shorthand for a-- == 0
, following action ({ sub(/#/, "") }
) removes the first #
from current line,1
means print all lines,a=1
is required to remove #
from the line after shebang (i.e 2nd line).Upvotes: 1
Reputation: 7746
You can use this too:
sed '/^$\|^#!/{N;s/#r/r/}' input.txt
feel free to add the in-place opt if you want
Upvotes: 0