Reputation: 15
I have a file with 3972192 lines and two values tab separated for each line. I would like to separate every 47288 lines into a new column (this derives in 84 columns). I read these other question (Put every N rows of input into a new column) in which it does the same as I want but with awk I get:
awk: program limit exceeded: maximum number of fields size=32767
if I do it with pr, the limit of columns to separate is 36.
For doing this I first selected column 2 with awk:
awk '{print $2}' input_file>values_file
For getting the first column values I did:
awk '{print $1}' input_file>headers_file
head -n 47288 headers_file >headers_file2
Once I get the both files I will put them together with the paste function:
paste -d values_file headers_file2 >Desired_output
Example: INPUT:
-Line1: ABCD 12
-Line2: ASDF 3435
...
-Line47288: QWER 345466
-Line47289: ABCD 456
...
-Line94576: QWER 25
...
-Line3972192 QWER 436
DESIRED output WANTED:
- Line1: ABCD 12 456 ....
...
- Line47288: QWER 345466 25 .... 436
Any advice? thanks in advance,
Upvotes: 0
Views: 325
Reputation: 26
I suppose each block has the same pattern, I mean, the first column is in the same order [ABCD ASDF ... QWER] and again. If so, you have to take the first column of the first BLOCK [47288 lines] and echo to the target file. Then you have to get the second column of each BLOCK and paste it to the target file. I tried with this data file :
ABCD 1001 EFGH 1002 IJKL 1003 MNOP 1004 QRST 1005 UVWX 1006 ABCD 2001 EFGH 2002 IJKL 2003 MNOP 2004 QRST 2005 UVWX 2006 ABCD 3001 EFGH 3002 IJKL 3003 MNOP 3004 QRST 3005 UVWX 3006 ABCD 4001 EFGH 4002 IJKL 4003 MNOP 4004 QRST 4005 UVWX 4006 ABCD 5001 EFGH 5002 IJKL 5003 MNOP 5004 QRST 5005 UVWX 5006
And with this script :
#!/bin/bash
#target number of lines, change to 47288
LINES=6
INPUT='data.txt'
TOTALLINES=`wc --lines $INPUT | cut --delimiter=" " --field=1`
TOTALBLOCKS=$((TOTALLINES / LINES))
#getting first block of target file, the first column of first LINES of data file
head -n $LINES $INPUT | cut --field=1 > target.txt
#get second column of each line, by blocks, and paste it into target file
BLOCK=1
while [ $BLOCK -le $TOTALBLOCKS ]
do
HEADVALUE=$((BLOCK * LINES))
head -n $HEADVALUE $INPUT | tail -n $LINES | cut --field=2 > tmpcol.txt
cp target.txt targettmp.txt
paste targettmp.txt tmpcol.txt > target.txt
BLOCK=$((BLOCK+1))
done
#removing temp files
rm -f targettmp.txt
rm -f tmpcol.txt
And I got this target file :
ABCD 1001 2001 3001 4001 5001 EFGH 1002 2002 3002 4002 5002 IJKL 1003 2003 3003 4003 5003 MNOP 1004 2004 3004 4004 5004 QRST 1005 2005 3005 4005 5005 UVWX 1006 2006 3006 4006 5006
I hope this helps you.
Upvotes: 1