Reputation: 625
Please help to improve the below formatting command as it is taking lot of time, Input file delimiter is **
separated, 22.00
Million rows and 87
columns.
In output need to choose only 2
columns print substr($3,0,15),substr($4,3,10)
& comma separated delimiter.
time zcat hlr*.gz | awk -F"**" '{OFS=","; print substr($3,0,15),substr($4,3,10)}' >Op_Formatted.csv
When I am running the above command in uname: Linux is taking 5 Hours 32 Minutes
real 319m48.471s
user 313m49.924s
sys 1m32.803s
whereas uname: CYGWIN_NT-6.1 is taking 16 minutes only
real 16m52.823s
user 17m35.485s
sys 0m6.986s
Sample Input:
2**000001**804421890831817F**819200000068FFFF**00** 0** 21- 10** 72- 1** 90- 32** 51- 1** 54- 1** 55- 1** 126- 5** 141- 44** 143- 1** 140- 58** 105- 0** 106- 0** 121- 4** 147- 1** 152- 1** 34- 0** 33- 4** 9- 1** 10- 1** 38- 1** 110- 1** 2- 1** 4- 1** 5- 1** 6- 1** 8- 1** 43- 1** 44- 1** 45- 1** 46- 1** 85- 0** 86- 4** 42- 0** 47- 0** 48- 0** 49- 0** 112- 1**9607500248789478**
2**000002**804421812449266F**819200000227FFFF**00** 0** 21- 10** 72- 1** 90- 32** 51- 1** 54- 1** 55- 1** 126- 5** 141- 44** 143- 1** 140- 5** 105- 0** 106- 0** 121- 4** 147- 1** 152- 1** 34- 0** 33- 7** 9- 1** 10- 1** 38- 1** 110- 1** 2- 1** 4- 1** 5- 1** 6- 1** 8- 1** 43- 1** 44- 1** 45- 1** 46- 1** 85- 0** 86- 4** 42- 0** 47- 0** 48- 0** 49- 0** 112- 1**4592140525164919**
2**000003**804421830628518F**819200000312FFFF**00** 0** 21- 10** 72- 1** 90- 35** 51- 1** 54- 1** 55- 1** 126- 5** 141- 44** 140- 58** 105- 0** 106- 0** 121- 4** 147- 1** 152- 1** 34- 0** 33- 4** 9- 1** 10- 1** 38- 1** 110- 1** 2- 1** 4- 1** 5- 1** 6- 1** 8- 1** 43- 1** 44- 1** 45- 1** 46- 1** 85- 0** 86- 4** 42- 0** 47- 0** 48- 0** 49- 0** 112- 1**6570980506503001**
Sample Output:
804421890831817,9200000068
804421812449266,9200000227
804421830628518,9200000312
Upvotes: 0
Views: 92
Reputation: 45243
Check if your linux env has memory issue or disk I/O read issue. I am fine in my environment.
Here are some suggestions.
First, put OFS outside, in your command, OFS is defined again on every line.
zcat hlr*.gz | awk '{print substr($3,0,15),substr($4,3,10)}' FS="**" OFS="," >Op_Formatted.csv >Op_Formatted.csv
If the position is not changed, try this:
zcat hlr*.gz | awk '{print substr($0,12,15) "," substr($0,32,10)}' >Op_Formatted.csv >Op_Formatted.csv
Test with first command on a 3000 lines file
real 0m0.297s
user 0m0.249s
sys 0m0.046s
Test with second command:
real 0m0.078s
user 0m0.077s
sys 0m0.030s
Upvotes: 1