Daniel
Daniel

Reputation: 309

Linux: delete files that don't contain specific number of lines

How to remove files inside a directory that have more or less lines than specified (all files have ".txt" suffix)?

Upvotes: 6

Views: 3121

Answers (7)

Kevin Ivarsen
Kevin Ivarsen

Reputation: 1029

This bash script should do the trick. Save as "rmlc.sh".

Sample usage:

rmlc.sh -more 20 *.txt   # Remove all .txt files with more than 20 lines
rmlc.sh -less 15 *       # Remove ALL files with fewer than 15 lines

Note that if the rmlc.sh script is in the current directory, it is protected against deletion.


#!/bin/sh

# rmlc.sh - Remove by line count

SCRIPTNAME="rmlc.sh"
IFS=""

# Parse arguments 
if [ $# -lt 3 ]; then
    echo "Usage:"
    echo "$SCRIPTNAME [-more|-less] [numlines] file1 file2..."
    exit 
fi

if [ $1 == "-more" ]; then
    COMPARE="-gt" 
elif [ $1 == "-less" ]; then
    COMPARE="-lt" 
else
    echo "First argument must be -more or -less"
    exit 
fi

LINECOUNT=$2

# Discard non-filename arguments
shift 2

for filename in $*; do
    # Make sure we're dealing with a regular file first
    if [ ! -f "$filename" ]; then
        echo "Ignoring $filename"
        continue
    fi

    # We probably don't want to delete ourselves if script is in current dir
    if [ "$filename" == "$SCRIPTNAME" ]; then
        continue
    fi

    # Feed wc with stdin so that output doesn't include filename
    lines=`cat "$filename" | wc -l`

    # Check criteria and delete
    if [ $lines $COMPARE $LINECOUNT ]; then
        echo "Deleting $filename"
        rm "$filename"
    fi 
done

Upvotes: 11

eosar
eosar

Reputation: 1

A bit late since the question was asked. I just had the same question, and this is what a came up with, in the lines of Chad Campbell

find $DIR -name '*.txt' -exec wc -l {} \; | grep -v "$LINES" | awk '{print $2}' | xargs rm
  • First part looks for all the files in DIR ending in *.txt and print the number of lines.
  • Second part select all the files that do not have the required number of lines (LINES).
  • The third part prints just the file names.
  • And the forth part deletes those files.

Upvotes: 0

Chad Campbell
Chad Campbell

Reputation: 1

Here is a one liner option. RLINES is the number of lines to use for removal.

rm \`find $DIR -type f -exec wc -l {} \; | grep "^$RLINES " | awk '{print $2}'\`

Upvotes: 0

simon
simon

Reputation: 7022

My command line mashing is pretty rusty, but I think something like this will work safely (change the "10" to whatever number of lines in the grep) even if your filenames have spaces in them. Adjust as needed. You'd need to tweak it if newlines in filenames are possible.

find . -name \*.txt -type f -exec wc -l {} \; | grep -v "^10 .*$" | cut --complement -f 1 -d " " | tr '\012' '\000' | xargs -0 rm -f

Upvotes: 1

Sathya
Sathya

Reputation: 2421

This one liner should also do

 find -name '*.txt' | xargs  wc -l | awk '{if($1 > 1000 && index($2, "txt")>0 ) print $2}' | xargs rm

In the example above, files greater than 1000 lines are deleted.

Choose > and < and the number of lines accordingly.

Upvotes: 3

schnaader
schnaader

Reputation: 49729

Played a bit with the answer from 0x6adb015. This works for me:

LINES=10
for f in *.txt; do
  a=`cat "$f" | wc -l`;
  if [ "$a" -ne "$LINES" ]
  then
    rm -f "$f"
  fi
done

Upvotes: 4

0x6adb015
0x6adb015

Reputation: 7811

Try this bash script:

LINES=10
for f in *.txt; do 
  if [ `cat "$f" | wc -l` -ne $LINES ]; then 
     rm -f "$f"
  fi
done

(Not tested)

EDIT: Use a pipe to feed in wc, as wc prints the filename as well.

Upvotes: 1

Related Questions