Cutting a specific column from a space delimited file

Question

I run a hmmscan analysis using a FASTA file asking for tabular output format with --tblout option, which is deliberately space-delimited (rather than tab-delimited) and justified into aligned columns.

The file looks like this (this is just a format example)

targetname accession queryname    accession  e-value score bias
x_x_x      PFyyyy.y  ContigXXX_0  -          x.xe-xx yy.y  x.x
x          PFyyyy.yy COntigXXX_1  -          xe-x    yy.y  x.x
x_x        PFyyyy.y  COntigXXX_2  -          xe-xx    y.y  x.x
x_x_x      PFyyyy.yy COntigXXX_3  -          x.xe-x  yy.y  x.x
.
..

where target name are for example: Methyltransf or Dimer_tnp_hAT or Nucleotide_trans

where accession are for example: PF13847.1 or PF03407.11 or PF01958.13;

where query name are for example: Contig244_1 or Contig44245_3 or Contig12345_6

where the second accession column is: -

where e.value are for example: 4.0e-10 or 3.5e-15, etc..

and score and bias are numbers in this format: xx.x

What I'd like to do is to cut the queryname column where all the ContigXXX_X with significant hits to protein domains are.

After this I'll be able to sort them and keep only the first occurence of each Contig and I can compare the file with the results from BlastP and BlastX (where I was already able to get the list of my Contigs that have hits to nr database)

So my question is: How can I cut the column where all my Contigs are? I've been try with grep,sed,cut commands but I haven't found the right one yet.

I'm new to Unix language and I'm still learning so every suggestions will be really appreciate.

And if my question is not clear just tell me, I can modify it!

Vijay · Accepted Answer

awk 'NR!=1{print $3}' your_file

or

perl -F -lane 'if($.!=1){print $F[2]}' your_file

Cutting a specific column from a space delimited file

Answers (2)

Related Questions