chariko
chariko

Reputation: 15

Put every X rows of input into a new column

I have a file with 3972192 lines and two values tab separated for each line. I would like to separate every 47288 lines into a new column (this derives in 84 columns). I read these other question (Put every N rows of input into a new column) in which it does the same as I want but with awk I get:

awk: program limit exceeded: maximum number of fields size=32767

if I do it with pr, the limit of columns to separate is 36.

For doing this I first selected column 2 with awk:

awk '{print $2}' input_file>values_file

For getting the first column values I did:

awk '{print $1}' input_file>headers_file

head -n 47288 headers_file >headers_file2

Once I get the both files I will put them together with the paste function:

paste -d values_file headers_file2 >Desired_output

Example: INPUT:

 -Line1:        ABCD     12

 -Line2:         ASDF     3435

...


-Line47288:     QWER     345466

-Line47289:     ABCD     456

...


-Line94576:     QWER     25

...

-Line3972192    QWER     436

DESIRED output WANTED:

- Line1:         ABCD     12         456 ....

...

- Line47288:     QWER     345466     25  ....     436

Any advice? thanks in advance,

Upvotes: 0

Views: 325

Answers (1)

xmaestre
xmaestre

Reputation: 26

I suppose each block has the same pattern, I mean, the first column is in the same order [ABCD ASDF ... QWER] and again. If so, you have to take the first column of the first BLOCK [47288 lines] and echo to the target file. Then you have to get the second column of each BLOCK and paste it to the target file. I tried with this data file :

ABCD    1001
EFGH    1002
IJKL    1003
MNOP    1004
QRST    1005
UVWX    1006
ABCD    2001
EFGH    2002
IJKL    2003
MNOP    2004
QRST    2005
UVWX    2006
ABCD    3001
EFGH    3002
IJKL    3003
MNOP    3004
QRST    3005
UVWX    3006
ABCD    4001
EFGH    4002
IJKL    4003
MNOP    4004
QRST    4005
UVWX    4006
ABCD    5001
EFGH    5002
IJKL    5003
MNOP    5004
QRST    5005
UVWX    5006

And with this script :


    #!/bin/bash

    #target number of lines, change to 47288
    LINES=6
    INPUT='data.txt'
    TOTALLINES=`wc --lines $INPUT | cut --delimiter=" " --field=1`
    TOTALBLOCKS=$((TOTALLINES / LINES))


    #getting first block of target file, the first column of first LINES of data file
    head -n $LINES $INPUT | cut --field=1 > target.txt

    #get second column of each line, by blocks, and paste it into target file
    BLOCK=1
    while [ $BLOCK -le $TOTALBLOCKS ]
    do
        HEADVALUE=$((BLOCK * LINES))
        head -n $HEADVALUE $INPUT | tail -n $LINES | cut --field=2 > tmpcol.txt
        cp target.txt targettmp.txt
        paste targettmp.txt tmpcol.txt > target.txt
        BLOCK=$((BLOCK+1))
    done

    #removing temp files
    rm -f targettmp.txt
    rm -f tmpcol.txt

And I got this target file :

ABCD    1001    2001    3001    4001    5001
EFGH    1002    2002    3002    4002    5002
IJKL    1003    2003    3003    4003    5003
MNOP    1004    2004    3004    4004    5004
QRST    1005    2005    3005    4005    5005
UVWX    1006    2006    3006    4006    5006

I hope this helps you.

Upvotes: 1

Related Questions