user1458512
user1458512

Reputation: 159

How to split values in a column into separate column

My tab-delimited file looks like this:

  ID   Pop  snp1  snp2  snp3  snp4  snp5
  AD62  1  0/1   1/1   .    1/1   0/.
  AD75  1  0/0   1/1   .    ./0   1/0
  AD89  1  .     1/0   1/1  0/0   1/.

I want to separate the columns (starting from column 3) so that the values separated by the "/" character are delimited into a column of its own. However there are also columns whereby the values are missing (they only contain the "." character) and I want this to be treated as though it was "./." so that the two "." characters are then divided into their own columns. For example:

  ID   Pop  snp1     snp2     snp3     snp4     snp5
  AD62  1    0    1   1    1   .    .   1    1   0    .
  AD75  1    0    0   1    1   .    .   .    0   1    0
  AD89  1    .    .   1    0   1    1   0    0   1    .

Thanks

Upvotes: 1

Views: 1185

Answers (4)

Steve
Steve

Reputation: 54392

A fairly robust way, using awk and a few if statements:

awk '{ for (i = 1; i <= NF; i++) if (i >= 3 && i < NF && NR == 1) printf "%s\t\t", $i; else if (i == NF && NR == 1) print $i; else if ($i == "." && NR >= 2) printf ".\t.\t", $i; else { sub ("/", "\t", $i); if (i == NF) printf "%s\n", $i; else { printf "%s\t", $i; } } }' file.txt

Broken out on multiple lines:

awk '{ for (i = 1; i <= NF; i++)
   if (i >= 3 && i < NF && NR == 1) printf "%s\t\t", $i;
   else if (i == NF && NR == 1) print $i;
   else if ($i == "." && NR >= 2) printf ".\t.\t", $i;
   else {
      sub ("/", "\t", $i);
      if (i == NF) printf "%s\n", $i;
      else {
         printf "%s\t", $i;
      }
   }
}' file.txt

HTH

Upvotes: 0

potong
potong

Reputation: 58351

This might work for you (GNU sed):

sed ''1s/\t/&&/3g;s/\t\.\t/\t.\t.\t/g;y/\//\t/' file

Upvotes: 0

A.G.
A.G.

Reputation: 1319

You can use sed:

sed -e 's/ \. /\.\t\. /g' -e 's/\//\t/g' <your_file>

Upvotes: 1

mtk
mtk

Reputation: 13709

Tried this and works well, you can tweak this as per your requirement.

Assuming data is in data.txt file.

cat data.txt | sed 1d | tr '/' '\t'| sed 's/\./.\t./g'

This gives the output, but you need to get a work around for the spaces and tab that are getting messed up.

Upvotes: 0

Related Questions