HKJ3
HKJ3

Reputation: 477

How to use awk to count the occurence of a word beginning with something?

I have a file that looks like this:

**FID IID**
1   RQ50131-0
2   469314
3   469704
4   469712
5   RQ50135-2
6   469720
7   470145

I want to use awk to count the occurences of IDs beginning with 'RQ' in column 2. So for the little snapshot, it should be 2. After the RQ, the numbers differ so I want a count with anything that begins with RQ.

I am using this code

awk -F '\t' '{if(match("^RQ$",$2))print}'|wc -l  ID.txt > RQ.txt

But I don't get an output.

Upvotes: 0

Views: 50

Answers (3)

The fourth bird
The fourth bird

Reputation: 163362

Also apart from the reversed parameters for match, the file ID.txt should come right after the closing single quote.

As you want to print the whole line, you can omit the if statement and the print statement because match returns the index at which that substring begins, or 0 if there is no match.

awk 'match($2,"^RQ")' ID.txt | wc -l > RQ.txt

Upvotes: 0

Daweo
Daweo

Reputation: 36500

You did

{if(match("^RQ$",$2))print}

but compulsory arguments to match function are string, regexp. Also do not use $ if you are interesting in finding strings starting with as $ denotes end. After fixing that issues code would be

{if(match($2,"^RQ"))print}

Disclaimer: this answer does describe solely fixing problems with your current code, it does not contain any ways to ameliorate your code.

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626871

Tabs are used as field delimiters by default (same as spaces), so you can omit -F '\t'.

You can use

awk '$2 ~ /^RQ/{cnt++} END{print cnt}' ID.txt > RQ.txt

Once Field 2 starts with RQ, increment cnt and once the file is processed print cnt.

See the online demo.

Upvotes: 2

Related Questions