Reputation: 18278
Given a folder with n files with different numbers of lines :
$wc -l * | sort -n -r # list and sort files by number of lines
> 99860 total
50000 mr.txt
4976 स.txt
4462 प.txt
3745 म.txt
3545 क.txt
3195 व.txt
2201 न.txt
2183 ब.txt
2134 अ.txt
1789 र.txt
1666 द.txt
1623 आ.txt
1568 ग.txt
1524 ज.txt
1507 त.txt
1376 श.txt
1132 ल.txt
1102 ह.txt
1089 च.txt
1076 उ.txt
1025 भ.txt
809 य.txt
791 फ.txt
766 ख.txt
652 ट.txt
645 घ.txt
480 ए.txt
456 इ.txt
446 ध.txt
420 ड.txt
318 ठ.txt
273 झ.txt
182 थ.txt
163 ओ.txt
118 छ.txt
115 ऑ.txt
64 ऐ.txt
55 ढ.txt
44 औ.txt
29 २.txt
26 ई.txt
20 ष.txt
20 ऊ.txt
20 १.txt
14 ऋ.txt
6 ऱ.txt
4 ३.txt
2 ९.txt
2 ८.txt
1 ॐ.txt
1 ४.txt
How to select files with less than 200 lines ?
So I may feed those via >> output.txt
to a final file.
Upvotes: 0
Views: 219
Reputation: 10133
An implementation in pure bash
, without using any external command-line utilities. This will work for any filenames (including filenames containing newline characters), and also prevent the output file itself from being merged if it already exists:
#!/bin/bash
outfile='merged_output.txt'
cutoff=200
for file in *; do
[[ $file = "$outfile" || ! -f $file ]] && continue
mapfile -n "$cutoff" lines < "$file"
(( ${#lines[@]} < cutoff )) && printf '%s' "${lines[@]}"
done >> "$outfile"
Upvotes: 2
Reputation: 204015
Using any awk in any shell on every Unix box:
awk '
FNR == 1 { printf "%s", buf; buf="" }
{ buf = buf $0 ORS }
FNR >= 200 { buf=""; nextfile }
' *
An awk that supports nextfile
as a command will run faster than one that doesn't (which will just ignore it thinking it's an unset variable).
Upvotes: 1
Reputation: 27255
Assuming your filenames are free of whitespace and special symbols like '"\
, use
wc -l * | awk '$1 < 200 {print $2}' | xargs cat >> merged.txt
Because *
expands in sorted order concatenation is done alpabetical.
Upvotes: 3