Luca
Luca

Reputation: 59

Length comparison of one specific field in linux

I was trying to check the length of second field of a TSV file (hundreds of thousands of lines). However, it runs very very slowly. I guess it should be something wrong with "echo", but not sure how to do.

Input file:

prob    name
1.0     Claire
1.0     Mark
...     ...
0.9     GFGKHJGJGHKGDFUFULFD

So I need to print out what went wrong in the name. I tested with a little example using "head -100" and it worked. But just can't cope with original file.

This is what I ran:

for title in `cat filename | cut -f2`;do
length=`echo -n $line | wc -m`
if [ "$length" -gt 10 ];then
echo $line
fi
done

Upvotes: 0

Views: 302

Answers (3)

oliv
oliv

Reputation: 13259

awk to rescue:

awk 'length($2)>10' file

This will print all lines having the second field length longer than 10 characters.

Note that it doesn't require any block statement {...} because if the condition is met, awk will by default print the line.

Upvotes: 1

Shravan Yadav
Shravan Yadav

Reputation: 1317

We can use awk if that helps.

awk '{if(length($2) > 10){print}}' filename

$2 here is 2nd field in filename which runs for every line. It would be faster.

Upvotes: 1

Igor S.K.
Igor S.K.

Reputation: 1039

Try this probably:

cat file.tsv | awk '{if (length($2) > 10) print $0;}'

This should be a bit faster since the whole processing is done by the single awk process, while your solution starts 2 processes per loop iteration to make that comparison.

Upvotes: 1

Related Questions