Reputation: 23

grep, cut, sed, awk a file for 3rd column, n lines at a time, then paste into repeated columns of n rows?

I have a file of the form:

#some header text
a    1       1234
b    2       3333
c    2       1357

#some header text 
a    4       8765
b    1       1212
c    7       9999
...

with repeated data in n-row chunks separated by a blank line (with possibly some other header text). I'm only interested in the third column, and would like to do some grep, cut, awk, sed, paste magic to turn it in to this:

a   1234    8765   ...
b   3333    1212
c   1357    9999

where the third column of each subsequent n-row chunk is tacked on as a new column. I guess you could call it a transpose, just n-lines at a time, and only a specific column. The leading (a b c) column label isn't essential... I'd be happy if I could just grab the data in the third column

Is this even possible? It must be. I can get things chopped down to only the interesting columns with grep and cut:

cat myfile | grep -A2 ^a\  | cut -c13-15

but I can't figure out how to take these n-row chunks and sed/paste/whatever them into repeated n-row columns.

Any ideas?

Upvotes: 1

Answers (4)

Ed Morton

Reputation: 203665

$ awk -v RS= -F'\n' '{ for (i=2;i<=NF;i++) {split($i,f,/[[:space:]]+/); map[f[1]] = map[f[1]] " " f[3]} } END{ for (key in map) print key map[key]}' file
a 1234 8765
b 3333 1212
c 1357 9999

Upvotes: 0

rpax

Reputation: 4496

Using bash > 4.0:

declare -A array
while read line
do
   if [[ $line && $line != \#* ]];then
       c=$( echo $line | cut -f 1 -d ' ')
       value=$( echo $line | cut -f 3 -d ' ')
       array[$c]="${array[$c]} $value"
   fi
done < myFile.txt

for k in "${!array[@]}"
do
    echo "$k ${array[$k]}"
done

Will produce:

a  1234 8765
b  3333 1212
c  1357 9999

It stores the letter as the key of the associative array, and in each iteration, appends the correspondig value to it.

Upvotes: 0

konsolebox

Reputation: 75498

awk '/#/{next}{a[$1] = a[$1] $3 "\t"}END{for(i in a){print i, a[i]}}' file

Would produce

a 1234  8765
b 3333  1212
c 1357  9999

You can change "\t" to a different output separator like " " if you like.

sub(/\t$/, "", a[i]); may be inserted before printif uf you don't like having trailing spaces. Another solution is to check if a[$1] already has a value where you decide if you have append to a previous value or not. It complicates the code a bit though.

Upvotes: 1

anubhava

Reputation: 785246

This awk does the job:

awk 'NF<3 || /^(#|[[:blank:]]*$)/{next} !a[$1]{b[++k]=$1; a[$1]=$3; next} 
        {a[$1] = a[$1] OFS $3} END{for(i=1; i<=k; i++) print b[i], a[b[i]]}' file
a 1234 8765
b 3333 1212
c 1357 9999

Upvotes: 1

grep, cut, sed, awk a file for 3rd column, n lines at a time, then paste into repeated columns of n rows?

Answers (4)

Related Questions