jimakos17
jimakos17

Reputation: 935

bash script Rows to Columns

How can I change a file which is like this:

A   25
B   26
A   14
D   39
E   42

A   74
B   36
A   81
D   96
E   17

A   23
B   14
A   74
D   87
E   17

into a file with the rows as columns, but only once, and their according values in columns? Like this:

 A   B     A   D    E
25   26   14   39  42
74   36   81   96  17
23   14   74   87  17

My columns are repeated every 29 rows and some columns, like A, have the same name.

Upvotes: 1

Views: 810

Answers (5)

JJoao
JJoao

Reputation: 5347

Or a litle bit more reg-exp oriented:

perl -0pE  'say s/\s*\d+\h*\n|\n.*/ /sgr;  s/(^|\n)\w\s*/ /g' file

Upvotes: 0

glenn jackman
glenn jackman

Reputation: 246744

For fun, some opaque, perl-ish ruby:

ruby -00 -lane '
    headers, values = $F.each_with_index.partition {|(v,i)| i.even?}
    puts headers.collect(&:first).join(" ") if $. == 1
    puts values.collect(&:first).join(" ")
' file

Upvotes: 1

hek2mgl
hek2mgl

Reputation: 157947

You can use the following awk script to transform the file:

transform.awk:

{
    # On the first record this loop runs twice. once
    # for the headers once for the first line of data.
    # In all subsequent lines is prints only the data
    # because h==1.
    for(;h<=1;h++){
        for(i=1+h;i<=NF;i+=2){
            printf "%s ", $i
        }
        printf "\n"
    }
    h=1
}

Then execute it like this:

awk -f transform.awk RS='' file

Output:

A B A D E 
25 26 14 39 42 
74 36 81 96 17 
23 14 74 87 17

To get proper aligned columns you can pipe to column -t:

awk -f transform.awk RS='' file | column -t

Output:

A   B   A   D   E
25  26  14  39  42
74  36  81  96  17
23  14  74  87  17

The key here is the usage of the variable RS (record separator). Using an empty string for RS separates records by blank lines. It is the same as setting it to \n\n+ (one or more blank lines). The first record for examples will look like this:

A   25
B   26
A   14 
D   39
E   42

awk by default splits by [[:space:]]+ which includes newlines. This gives us the following fields for record one.

A 25 B 26 A 14 D 39 E 42

The algorithm shown above transforms this fields to the desired output.

Upvotes: 5

David C. Rankin
David C. Rankin

Reputation: 84521

And just to round out the mix, you can do it in a fairly flexible manner with a simple script (limited to reading 2-column files formatted as your input file is shown) It will read the data from a filename given as the first argument (or from stdin by default).

The script simply reads column-1 and column-2 into separate indexed arrays (a1 & a2) until a blank line is encountered, and, if it is the first time through, prints the heading row (and sets the heading flag h to not print again), followed by printing the data in a2.

When the end of the file is reached is simply prints the final row of data.

#!/bin/bash

fname="${1:-/dev/stdin}"

declare -i h=0
declare -a a1
declare -a a2

while read -r line; do
    if [ "$line" != "" ]; then
        a1+=( ${line%% *} )
        a2+=( ${line##* } )
    else 
        [ "$h" -eq 0 ] && { printf " %2s" ${a1[@]}; echo ""; h=1; }
        printf " %2s" ${a2[@]}
        echo ""
        unset a1; unset a2;
    fi
done < "$fname"

printf " %2s" ${a2[@]}
echo ""

Use/Output

$ bash r2c.sh dat/r2c.txt
  A  B  A  D  E
 25 26 14 39 42
 74 36 81 96 17
 23 14 74 87 17

Upvotes: 0

karakfa
karakfa

Reputation: 67467

alternative to awk solution with other unix toolset (used extensively)

$ sed '/^$/d' file    | 
  pr -3ts' '          | 
  tr '\t' ' '         | 
  tr -s ' '           | 
  cut -d' ' -f1,2,4,6 | 
  tr ' ' '\n'         | 
  pr -5ts' '          |
  column -t



A   B   A   D   E
25  26  14  39  42
74  36  81  96  17
23  14  74  87  17

first magic number 3 is number of repeated sections (or number of rows without header) and second magic number 5 is number of items in each section (or number of columns)

Upvotes: 2

Related Questions