john
john

Reputation: 263

summarizing the contents of a text file to an other one using awk

I have a big text file with 2 tab separated fields. as you see in the small example every 2 lines have a number in common. I want to summarize my text file in this way. 1- look for the lines that have the number in common and sum up the second column of those lines.

small example:

ENST00000054666.6   2
ENST00000054666.6_2 15
ENST00000054668.5   4
ENST00000054668.5_2 10
ENST00000054950.3   0
ENST00000054950.3_2 4

expected output:

ENST00000054666.6   17
ENST00000054668.5   14
ENST00000054950.3   4

as you see the difference is in both columns. in the 1st column there is only one repeat of each common and without "_2" and in the 2nd column the values is sum up of both lines (which have common number in input file).

I tried this code but does not return what I want:

awk -F '\t' '{ col2 = $2, $2=col2; print }' OFS='\t' input.txt > output.txt

do you know how to fix it?

Upvotes: 1

Views: 55

Answers (2)

sjsam
sjsam

Reputation: 21965

If order is not a concern, below may also help :

awk -v FS="\t|_" '{count[$1]+=$NF}
                 END{for(i in count){printf "%s\t%s%s",i,count[i],ORS;}}' file
ENST00000054668.5   14
ENST00000054950.3   4
ENST00000054666.6   17

Edit : If the order of the output does matter, below approach using a flag helps :

$ awk -v FS="\t|_" '{count[$1]+=$NF;++i;
                     if(i==2){printf "%s\t%s%s",$1,count[$1],ORS;i=0}}' file

ENST00000054666.6   17
ENST00000054668.5   14
ENST00000054950.3   4

Upvotes: 3

RavinderSingh13
RavinderSingh13

Reputation: 133518

Solution 1st: Following awk may help you on same.

awk '{sub(/_.*/,"",$1)} {a[$1]+=$NF} END{for(i in a){print i,a[i]}}'   Input_file

Solution 2nd: In case your Input_file is sorted by 1st field then following may help you.

awk '{sub(/_.*/,"",$1)} prev!=$1 && prev{print prev,val;val=""} {val+=$NF;prev=$1} END{if(val){print prev,val}}'  Input_file

Use > output.txt at the end of the above codes in case you need the output in a output file too.

Upvotes: 3

Related Questions