Vikas Dubey
Vikas Dubey

Reputation: 37

Converting Rows into Columns using awk or sed

I have a file with *.xvg format.
It contains six columns with 500 numbers each.
Except the time column (first column) all other columns contain floats.

I want to generate an output file in same format, in which these columns are converted into rows with each number separated by space.

I have written a program in C, which works fine for me but I am looking for an alternative way using awk or sed, which will allow me to do the same.

I am absolutely new to these scripting languages. I couldn't find any relevant answer for me in previously asked questions. So, If somebody can help me out with this task I will be grateful.

Input file looks like this :-

  # This file was created Thu Oct  1 17:18:10 2015
  # by the following command:
  # /home/durba/gmx455/bin/mdrun -np 1 -deffnm md0 -v 
  #
  @    title "dH/d\xl\f{}, \xD\f{}H"
  @    xaxis  label "Time (ps)"
  @    yaxis  label "(kJ/mol)"
  @TYPE xy
  @ subtitle "T = 200 (K), \xl\f{} = 0"
  @ view 0.15, 0.15, 0.75, 0.85
  @ legend on
  @ legend box on
  @ legend loctype view
  @ legend 0.78, 0.8
  @ legend length 2
  @ s0 legend "dH/d\xl\f{} \xl\f{} 0"
  @ s1 legend "\xD\f{}H \xl\f{} 0.05"
  0  19.3191 1.16531   1.8   -447.07  -47.07
  2 -447.072 -17.6454  1.5   -17.633  -1.33
  4 -17.633 -0.446508  1.3   -75.455  -5.45
  6 -75.4555 -2.83981  1.4   -28.724  -28.4
  8 -28.7246 -0.884639 1.5   -41.877  -14.87
  10 -41.8779 -1.45569  2.8   -43.685  -3.685
  12 -43.6851 -1.4797   -3.1  -91.651  -91.651
  14 -91.6515 -3.52492  -3.5  -61.135  -1.135
  16 -61.1356 -2.30129  -3.2  -48.847  -48.47

output file should look like this :-

  # This file was created Thu Oct  1 17:18:10 2015
  # by the following command:
  # /home/durba/gmx455/bin/mdrun -np 1 -deffnm md0 -v 
  #
  @    title "dH/d\xl\f{}, \xD\f{}H"
  @    xaxis  label "Time (ps)"
  @    yaxis  label "(kJ/mol)"
  @TYPE xy
  @ subtitle "T = 200 (K), \xl\f{} = 0"
  @ view 0.15, 0.15, 0.75, 0.85
  @ legend on
  @ legend box on
  @ legend loctype view
  @ legend 0.78, 0.8
  @ legend length 2
  @ s0 legend "dH/d\xl\f{} \xl\f{} 0"
  @ s1 legend "\xD\f{}H \xl\f{} 0.05"
  0  2  4 6 8 10 12 
  19.3191 -447.072 -17.633 -17.633 -75.4555 -28.7246 -41.8779 -43.6851 -91.6515 -61.1356
  1.16531 -17.6454 -0.446508 -2.83981 -0.884639 -1.45569 -1.4797 -3.52492 -2.30129
  1.8 1.5 1.3 1.4 1.5 2.8 -3.1 -3.5 -3.2
  -447.07 -17.633 -75.455 -28.724 -41.877 -43.685 -91.651 -61.135 -48.847
  -47.07 -1.33 -5.45 -28.4 -14.87 -3.685 -91.651 -1.135 -48.47

Please note that lines starting with "#" and "@" should be the same in both files.

Upvotes: 1

Views: 1011

Answers (2)

potong
potong

Reputation: 58361

This might work for you (GNU sed):

sed -r 'H;$!d;x;:a;h;s/\n(\S+)[^\n]*/\1 /g;s/ $//p;g;s/\n\S+ ?/\n/g;ta;d' file

Slurp the file into hold space (HS) deleting the pattern space (PS) until the end-of-file condition is met. At end-of-file swap the HS for the PS. Copy the PS to the HS and then remove all but the first field following a newline with the first field followed by a space, globally. Remove the last space and print the line. Then recall the copy of the line from the HS and do the inverse. If any of the substitutions were successful repeat the process until nothing but newlines exist. Delete the unwanted newlines.

Since first answering the original question changed. The new solution below caters for the new question using essentially the same method:

sed -r '/^[0-9]/{s/ +/ /g;H};//!p;$!d;x;:a;h;s/\n(\S+)[^\n]*/\1 /g;s/ $//p;g;s/\n\S+ ?/\n/g;ta;d' file

Upvotes: 1

John1024
John1024

Reputation: 113814

Answer for original question

Let's consider this test file:

$ cat file
123 1.2 1.3 1.4 1.5
124 2.2 2.3 2.4 2.5
125 3.2 3.3 3.4 3.5

To convert columns to row:

$ awk '{for (i=1;i<=NF;i++)a[i,NR]=$i} END{for (i=1;i<=NF;i++) for (j=1;j<=NR;j++) printf "%s%s",a[i,j],(j==NR?ORS:OFS)}' file
123 124 125
1.2 2.2 3.2
1.3 2.3 3.3
1.4 2.4 3.4
1.5 2.5 3.5

How it works

  • for (i=1;i<=NF;i++)a[i,NR]=$i

    As we loop through each line, we save the values in array a.

  • END{for (i=1;i<=NF;i++) for (j=1;j<=NR;j++) printf "%s%s",a[i,j],(j==NR?ORS:OFS)}

    After we reach the end of the file, we print each of the values followed by the output field separator (OFS) if we are in the midst of a line or the output record separator (ORS) if we are at the end of the line.

Multi-line version

If you like your code spread over several lines:

awk '
{
  for (i=1;i<=NF;i++)
    a[i,NR]=$i
}

END{
  for (i=1;i<=NF;i++)
    for (j=1;j<=NR;j++)
      printf "%s%s",a[i,j],(j==NR?ORS:OFS)
}
' file

Answer for revised question

In the revised question, there are lines at the beginning of the file that start with @ or # that are not to be changed. In this case:

$ awk '/^[@#]/{print;next}{k++; for (i=1;i<=NF;i++)a[i,k]=$i;} END{for (i=1;i<=NF;i++) for (j=1;j<=k;j++) printf "%s%s",a[i,j],(j==k?ORS:OFS)}' input
# This file was created Thu Oct  1 17:18:10 2015
# by the following command:
# /home/durba/gmx455/bin/mdrun -np 1 -deffnm md0 -v 
#
#
#
@    title "dH/d\xl\f{}, \xD\f{}H"
@    xaxis  label "Time (ps)"
@    yaxis  label "(kJ/mol)"
@TYPE xy
@ subtitle "T = 200 (K), \xl\f{} = 0"
@ view 0.15, 0.15, 0.75, 0.85
@ legend on
@ legend box on
@ legend loctype view
@ legend 0.78, 0.8
@ legend length 2
@ s0 legend "dH/d\xl\f{} \xl\f{} 0"
@ s1 legend "\xD\f{}H \xl\f{} 0.05"
0 2 4 6 8 10 12 14 16
19.3191 -447.072 -17.633 -75.4555 -28.7246 -41.8779 -43.6851 -91.6515 -61.1356
1.16531 -17.6454 -0.446508 -2.83981 -0.884639 -1.45569 -1.4797 -3.52492 -2.30129
1.8 1.5 1.3 1.4 1.5 2.8 -3.1 -3.5 -3.2
-447.07 -17.633 -75.455 -28.724 -41.877 -43.685 -91.651 -61.135 -48.847
-47.07 -1.33 -5.45 -28.4 -14.87 -3.685 -91.651 -1.135 -48.47

Upvotes: 4

Related Questions