Rik
Rik

Reputation: 1987

Print only lines with more than $1 words

I'd like to print only the lines with $1 number of words or more. Please help.

while read line ; do
    echo $line | wc -w 
done t1.txt

Upvotes: 2

Views: 1659

Answers (2)

Walter A
Walter A

Reputation: 20022

Two things to remember when using a while construction.
1. Use read -r, not read to keep your input literally. 2. Keep external commands out of your body (like you have now). When you want to process lines with while and an external utility, try to pull the external utility outside the while loop. In the while loop it will be called for every line, outside the loop it will be called only once. You would expect, that you should place the preprocessing chain of commands in front of the while loop:

cmd1 | cmd2 | cmd3 | while read -r line; do
   echo "This ${line} has been preprocessed."
done

This solution has one big drawback. The while-loop is processed in a subprocess and any changes to variables set in the loop will be lost.

You can improve this by "process substitution":

while read -r line; do
   echo "This ${line} has been preprocessed."
done < <(cmd1 | cmd2 | cmd3)

Now let's focus on cmd1 | cmd2 | cmd3. How do you get the first 3 of ${n} words from each line? You need to tune your command according to how you want to look to words. Is word<space><space>word a line with 2 words or a line with an empty second and a third word? Play with different options to parse t1.txt:

awk # syntax not included here
grep ".* .* .*" # Difficult to use $n
grep -E "^(\w+ *){3,}" t1.txt
grep -E "^(\w+ *){$n,}" t1.txt
sed -n '/.* .* .*/p' t1.txt

The output of these commands can be redirected to the while loop, but for your basic requirements the while loop can be skipped.

Upvotes: 1

zzevannn
zzevannn

Reputation: 3724

Assuming you're defining a word as characters delimited by spaces, then awk would do this easily:

awk -v COUNT=$1 'NF>COUNT' t1.txt

It passes the first arg in as an awk variable named count, and prints rows where the number of space delimited fields is above the count provided.

e.g.

$ echo $COUNT
3
$ cat t1.txt
hey
hey hey hey hey hey
hey hey hey
hey hey hey
hey hey hey hey hey
hey hey hey hey hey
hey hey hey

$ awk -v COUNT=$COUNT 'NF>COUNT' t1.txt
hey hey hey hey hey
hey hey hey hey hey
hey hey hey hey hey

Upvotes: 8

Related Questions