AnkP
AnkP

Reputation: 651

find first value matching the substring

The 9th column has multiple values separated with ";". I am trying to find first occurrence of string after "name_id" in column $9 of a tab limited file - the first line of the file looks like this eg.

1   NY  state   3102016 3102125 .   +   .   name_id "ENSMUSG8868"; trans_id "ENSMUST00000082908"; number "1"; id_name "Gm26206";ex_id "ENSMUSE000005";

There are multiple values separated by";" in 9th column. I could come up with this command that pulls out the last "ENSMUSE000005" id

sed 's|.*"\([0-9_A-Z]\+\)".*|\1|' input.txt | head

Can it be done with regex in awk? thanks a lot!

Upvotes: 0

Views: 44

Answers (1)

P....
P....

Reputation: 18351

echo $x |awk -F';' '{split($1,a," ");gsub(/"/ ,"" ,a[10]);print a[10]}'
ENSMUSG8868

Where x is your line.

Based on OP's comments :

echo $x |awk -F';' '{split($1,a," ");gsub(/"/ ,"" ,a[10]);print a[1],a[10]}'
1 ENSMUSG8868

Upvotes: 2

Related Questions