Reputation: 69
I'm using an Awk script to split a big text document into independent files. I did it and now I'm working with 14k text files. The problem here is there are a lot of files with just three lines of text and it's not useful for me to keep them.
I know I can delete lines in a text with awk 'NF>=3' file
, but I don't want to delete lines inside files, rather I want to delete files which content is just two or three text lines.
Thanks in advance.
Upvotes: 2
Views: 918
Reputation: 8711
You can try Perl. The below solution will be efficient as the file handle ARGV will be closed if the line count > 3
perl -nle ' close(ARGV) if ($.>3) ; $kv{$ARGV}++; END { for(sort keys %kv) { print if $kv{$_}>3 } } ' *
If you want to pipe the output of some other command (say find) you can use it like
$ find . -name "*" -type f -exec perl -nle ' close(ARGV) if ($.>3) ; $kv{$ARGV}++; END { for(sort keys %kv) { print if $kv{$_}>3 } } ' {} \;
./bing.fasta
./chris_smith.txt
./dawn.txt
./drcatfish.txt
./foo.yaml
./ip.txt
./join_tab.pl
./manoj1.txt
./manoj2.txt
./moose.txt
./query_ip.txt
./scottc.txt
./seats.ksh
./tane.txt
./test_input_so.txt
./ya801.txt
$
the output of wc -l * on the same directory
$ wc -l *
12 bing.fasta
16 chris_smith.txt
8 dawn.txt
9 drcatfish.txt
3 fileA
3 fileB
13 foo.yaml
3 hubbs.txt
8 ip.txt
19 join_tab.pl
6 manoj1.txt
6 manoj2.txt
5 moose.txt
17 query_ip.txt
3 rororo.txt
5 scottc.txt
22 seats.ksh
1 steveman.txt
4 tane.txt
13 test_input_so.txt
24 ya801.txt
200 total
$
Upvotes: 1
Reputation: 8406
If the files in the current directory are all text files, this should be efficient and portable:
for f in *; do
[ $(head -4 "$f" | wc -l) -lt 4 ] && echo "$f"
done # | xargs rm
Inspect the list, and if it looks OK, then remove the #
on the last line to actually delete the unwanted files.
Why use head -4
? Because wc
doesn't know when to quit. Suppose half of the text files were each more than a terabyte long; if that were the case wc -l
alone would be quite slow.
Upvotes: 3
Reputation: 133630
Could you please try following find
command.(tested with GNU awk
)
find /your/path/ -type f -exec awk -v lines=3 'NR>lines{f=1; exit} END{if (!f) print FILENAME}' {} \;
So above will print file names who are having lesser than 3 lines on console. Once you are happy with results coming then try following to delete them. Only once you are ok with above command's output run following and even I will suggest run below command in a test directory first and once you are fully satisfied then proceed with below one.(remove echo
from below I have still put it for safer side :) )
find /your/path/ -type f -exec awk -v lines=3 'NR>lines{f=1; exit} END{exit !f}' {} \; -exec echo rm -f {} \;
Upvotes: 3
Reputation: 518
You may use wc
to calculate lines and then decide either to delete the file or not. you should write a shell script instead of just awk
command.
Upvotes: 1