Revan
Revan

Reputation: 2322

Bash: extract columns with cut and filter one column further

I have a tab-separated file and want to extract a few columns with cut.

Two example line

(...)
0    0    1    0    AB=1,2,3;CD=4,5,6;EF=7,8,9    0    0
1    1    0    0    AB=2,1,3;CD=1,1,2;EF=5,3,4    0    1
(...)

What I want to achieve is to select columns 2,3,5 and 7, however from column 5 only CD=4,5,6.

So my expected result is

0    1    CD=4,5,6;    0
1    0    CD=1,1,2;    1

How can I use cut for this problem and run grep on one of the extracted columns? Any other one-liner is of course also fine.

Upvotes: 1

Views: 1105

Answers (3)

Thor
Thor

Reputation: 47239

I think awk is the best tool for this kind of task and the other two answers give you good short solutions.

I want to point out that you can use awk's built-in splitting facility to gain more flexibility when parsing input. Here is an example script that uses implicit splitting:

parse.awk

# Remember second, third and seventh columns
{
  a = $2
  b = $3
  d = $7
}

# Split the fifth column on ";". After this the positional variables
# (e.g. $1, # $2, ..., $NF) contain the fields from the previous
# fifth column
{
  oldFS = FS
  FS    = ";"
  $0    = $5
}

# For example to test if the second elemnt starts with "CD", do 
# something like this
$2 ~ /^CD/ {
  c = $2
}

# Print the selected elements
{
  print a, b, c, d
}

# Restore FS
{
  FS = oldFS
}

Run it like this:

awk -f parse.awk FS='\t' OFS='\t' infile

Output:

0   1   CD=4,5,6    0
1   0   CD=1,1,2    1

Upvotes: 1

Barmar
Barmar

Reputation: 782717

Easier done with awk. Split the 5th field using ; as the separator, and then print the second subfield.

awk 'BEGIN {FS="\t"; OFS="\t"} 
     {split($5, a, ";"); print $2, $3, a[2]";", $7 }' inputfile > outputfile

If you want to print whichever subfield begins with CD=, use a loop:

awk 'BEGIN {FS="\t"; OFS="\t"} 
     {n = split($5, a, ";");
      for (i = 1; i <= n; i++) {
        if (a[i] ~ /^CD=/) subfield = a[i];
      }
      print $2, $3, subfield";", $7}' < inputfile > outputfile

Upvotes: 3

karakfa
karakfa

Reputation: 67567

here is another awk

$ awk -F'\t|;' -v OFS='\t' '{print $2,$3,$6,$NF}' file

0       1       CD=4,5,6        0
1       0       CD=1,1,2        1

or with cut/paste

$ paste <(cut -f2,3 file) <(cut -d';' -f2 file) <(cut -f7 file)

0       1       CD=4,5,6        0
1       0       CD=1,1,2        1

Upvotes: 4

Related Questions