Reputation: 23
file1:
scaffold2232_size19577 gene 8878 9258
scaffold2232_size19577 CDS 8878 9258
scaffold2232_size19577 gene 10631 14562
scaffold2232_size19577 intron 10693 11242
scaffold2232_size19577 intron 11343 14252
scaffold2232_size19577 intron 14346 14499
scaffold2232_size19577 CDS 10631 10692
scaffold2232_size19577 CDS 11243 11342
scaffold2232_size19577 CDS 14253 14345
scaffold2232_size19577 CDS 14500 14562
scaffold2232_size19577 gene 18807 19055
scaffold2232_size19577 CDS 18807 19055
file2:
scaffold2232_size19577 8878 9258 Os12t0508300-01
scaffold2232_size19577 8878 9258 Os12t0508300-01
scaffold2232_size19577 10631 14562 Os12t0508300-01
scaffold2232_size19577 10693 11242 Os12t0508300-01
scaffold2232_size19577 11343 14252 Os12t0508300-01
scaffold2232_size19577 14346 14499 Os12t0508400-00
scaffold2232_size19577 14346 14499 Os12t0508400-00
scaffold2232_size19577 14346 14499 Os12t0508400-00
scaffold2232_size19577 10631 10692 Os12t0508300-01
scaffold2232_size19577 11243 11342 Os12t0508300-01
scaffold2232_size19577 14253 14345 Os12t0508400-00
scaffold2232_size19577 14253 14345 Os12t0508400-00
scaffold2232_size19577 14253 14345 Os12t0508400-00
scaffold2232_size19577 14500 14562 Os12t0508400-00
scaffold2232_size19577 14500 14562 Os12t0508400-00
scaffold2232_size19577 14500 14562 Os12t0508400-00
scaffold2232_size19577 18807 19055 Os12t0508400-00
scaffold2232_size19577 18807 19055 Os12t0508400-00
scaffold2232_size19577 18807 19055 Os12t0508400-00
scaffold2232_size19577 18807 19055 Os12t0508400-00
scaffold2232_size19577 18807 19055 Os12t0508400-00
scaffold2232_size19577 18807 19055 Os12t0508400-00
desired output:
scaffold2232_size19577 8878 9258 Os12t0508300-01 gene
scaffold2232_size19577 8878 9258 Os12t0508300-01 CDS
scaffold2232_size19577 10631 14562 Os12t0508300-01 gene
scaffold2232_size19577 10693 11242 Os12t0508300-01 intron
scaffold2232_size19577 11343 14252 Os12t0508300-01 intron
scaffold2232_size19577 14346 14499 Os12t0508400-00 intron
scaffold2232_size19577 10631 10692 Os12t0508300-01 CDS
scaffold2232_size19577 11243 11342 Os12t0508300-01 CDS
scaffold2232_size19577 14253 14345 Os12t0508400-00 CDS
scaffold2232_size19577 14500 14562 Os12t0508400-00 CDS
scaffold2232_size19577 18807 19055 Os12t0508400-00 gene
scaffold2232_size19577 18807 19055 Os12t0508400-00 CDS
i tried doing: awk '{a[$1,$2,$3]=$0}END{for(i in a) print a[i]}' file2
but with this i am loosing one of the gene/CDS line as they have same co-ordinates in col[2],[3] so the output is coming:
scaffold2232_size19577 8878 9258 Os12t0508300-01
scaffold2232_size19577 10631 14562 Os12t0508300-01
scaffold2232_size19577 10693 11242 Os12t0508300-01
scaffold2232_size19577 11343 14252 Os12t0508300-01
scaffold2232_size19577 14346 14499 Os12t0508400-00
scaffold2232_size19577 10631 10692 Os12t0508300-01
scaffold2232_size19577 11243 11342 Os12t0508300-01
scaffold2232_size19577 14253 14345 Os12t0508400-00
scaffold2232_size19577 14500 14562 Os12t0508400-00
scaffold2232_size19577 18807 19055 Os12t0508400-00
i thought i can later add the col[2] of file1 to file2 but the number of rows are less after this operation of awk, so i am unable to add them. i want this to be like my desired output.
Upvotes: 0
Views: 40
Reputation: 41446
Something like this?
awk 'FNR==NR {a[$2FS$3]=$4;next} {print $1,$3,$4,a[$3FS$4],$2}' OFS="\t" f2 f1
scaffold2232_size19577 8878 9258 Os12t0508300-01 gene
scaffold2232_size19577 8878 9258 Os12t0508300-01 CDS
scaffold2232_size19577 10631 14562 Os12t0508300-01 gene
scaffold2232_size19577 10693 11242 Os12t0508300-01 intron
scaffold2232_size19577 11343 14252 Os12t0508300-01 intron
scaffold2232_size19577 14346 14499 Os12t0508400-00 intron
scaffold2232_size19577 10631 10692 Os12t0508300-01 CDS
scaffold2232_size19577 11243 11342 Os12t0508300-01 CDS
scaffold2232_size19577 14253 14345 Os12t0508400-00 CDS
scaffold2232_size19577 14500 14562 Os12t0508400-00 CDS
scaffold2232_size19577 18807 19055 Os12t0508400-00 gene
scaffold2232_size19577 18807 19055 Os12t0508400-00 CDS
Upvotes: 1