Hugolpz
Hugolpz

Reputation: 18278

How to select files with less than n lines?

Given a folder with n files with different numbers of lines :

$wc -l * | sort -n -r     # list and sort files by number of lines
> 99860 total
  50000 mr.txt
   4976 स.txt
   4462 प.txt
   3745 म.txt
   3545 क.txt
   3195 व.txt
   2201 न.txt
   2183 ब.txt
   2134 अ.txt
   1789 र.txt
   1666 द.txt
   1623 आ.txt
   1568 ग.txt
   1524 ज.txt
   1507 त.txt
   1376 श.txt
   1132 ल.txt
   1102 ह.txt
   1089 च.txt
   1076 उ.txt
   1025 भ.txt
    809 य.txt
    791 फ.txt
    766 ख.txt
    652 ट.txt
    645 घ.txt
    480 ए.txt
    456 इ.txt
    446 ध.txt
    420 ड.txt
    318 ठ.txt
    273 झ.txt
    182 थ.txt
    163 ओ.txt
    118 छ.txt
    115 ऑ.txt
     64 ऐ.txt
     55 ढ.txt
     44 औ.txt
     29 २.txt
     26 ई.txt
     20 ष.txt
     20 ऊ.txt
     20 १.txt
     14 ऋ.txt
      6 ऱ.txt
      4 ३.txt
      2 ९.txt
      2 ८.txt
      1 ॐ.txt
      1 ४.txt

How to select files with less than 200 lines ?

So I may feed those via >> output.txt to a final file.

Upvotes: 0

Views: 219

Answers (3)

M. Nejat Aydin
M. Nejat Aydin

Reputation: 10133

An implementation in pure bash, without using any external command-line utilities. This will work for any filenames (including filenames containing newline characters), and also prevent the output file itself from being merged if it already exists:

#!/bin/bash

outfile='merged_output.txt'
cutoff=200
for file in *; do
    [[ $file = "$outfile" || ! -f $file ]] && continue
    mapfile -n "$cutoff" lines < "$file"
    (( ${#lines[@]} < cutoff )) && printf '%s' "${lines[@]}"
done >> "$outfile"

Upvotes: 2

Ed Morton
Ed Morton

Reputation: 204015

Using any awk in any shell on every Unix box:

awk '
    FNR == 1 { printf "%s", buf; buf="" }
    { buf = buf $0 ORS }
    FNR >= 200 { buf=""; nextfile }
' *

An awk that supports nextfile as a command will run faster than one that doesn't (which will just ignore it thinking it's an unset variable).

Upvotes: 1

Socowi
Socowi

Reputation: 27255

Assuming your filenames are free of whitespace and special symbols like '"\, use

wc -l * | awk '$1 < 200 {print $2}' | xargs cat >> merged.txt

Because * expands in sorted order concatenation is done alpabetical.

Upvotes: 3

Related Questions