Reputation: 53
How can i :
Convert a fixed length file to a csv file.
Split records from input file (fixed length file) as per the column length.
I have tried to convert the file using 'awk' however, the result is incorrect due to the spaces in records.
Input file:
4002000W1ABCDABCD7821 12345671LSN12301630 00000000000091640
00409164
4002000W1ABCDABCD7821 12345671LSN12301630 00000000000091640
00409164
4002000W1ABCDABCD7821 12345671LSN12301630 00000000000091640
00409164
4002000W1ABCDABCD7821 12345671LSN12301630 00000000000091640
00409164
4002000W1ABCDABCD7821 12345671LSN12301630 00000000000091640
004009164
4002000W1ABCDABCD7821 12345671LSN12301630 00000000000091640
004009164
The first record starts from 4002000W1ABCDABCD7821 and ends at 00409164
There are 6 records in total.
Input file contains 6 records for the table.
There are more than 40 columns in the record, I have mentioned only a few of them.
Columns are fixed length as follows:
ABC_ID(9), def_sc(8), sde_hd(8),mln_hfg(12), ghi_jkl(13),ijk_klm(6),pqr_xyz(10)
Expected output is as follows:
Output file:
ABC_ID, def_sc, sde_hd, mln_hfg, ghi_jkl, ijk_klm, pqr_xyz
4002000W1, ABCDABCD,78211234, 56702291LSN1, 2301630000000, 000916, 4000409164
4002000W1, ABCDABCD,78211234, 56702291LSN1, 2301630000000, 000916, 4000409164
4002000W1, ABCDABCD,78211234, 56702291LSN1, 2301630000000, 000916, 4000409164
4002000W1, ABCDABCD,78211234, 56702291LSN1, 2301630000000, 000916, 4000409164
4002000W1, ABCDABCD,78211234, 56702291LSN1, 2301630000000, 000916, 4000409164
4002000W1, ABCDABCD,78211234, 56702291LSN1, 2301630000000, 000916, 4000409164
Is this possible to achieve using sed command.
Please suggest.
Upvotes: 1
Views: 2152
Reputation: 2761
awk -v FIELDWIDTHS="9 8 8 12 13 6 10" 'NR%2{temp=$0;next;} {$0=temp$0; gsub(/ /,""); print $1,$2,$3,$4,$5,$6,$7}' OFS=',' file
4002000W1ABCDABCD7821 123456702291LSN1230 16300000000009164
000409164
4002000W1ABCDABCD7821 123456702291LSN1230 16300000000009164
000409164
4002000W1ABCDABCD7821 123456702291LSN1230 16300000000009164
000409164
4002000W1ABCDABCD7821 123456702291LSN1230 16300000000009164
000409164
4002000W1ABCDABCD7821 123456702291LSN1230 16300000000009164
000409164
4002000W1ABCDABCD7821 123456702291LSN1230 16300000000009164
000409164
4002000W1,ABCDABCD,78211234,56702291LSN1,2301630000000,000916,4000409164
4002000W1,ABCDABCD,78211234,56702291LSN1,2301630000000,000916,4000409164
4002000W1,ABCDABCD,78211234,56702291LSN1,2301630000000,000916,4000409164
4002000W1,ABCDABCD,78211234,56702291LSN1,2301630000000,000916,4000409164
4002000W1,ABCDABCD,78211234,56702291LSN1,2301630000000,000916,4000409164
4002000W1,ABCDABCD,78211234,56702291LSN1,2301630000000,000916,4000409164
For adding the first row just print it in first of all inside BEGIN{...}
:
awk -v FIELDWIDTHS="9 8 8 12 13 6 10" 'BEGIN{print "ABC_ID, def_sc, sde_hd, mln_hfg, ghi_jkl, ijk_klm, pqr_xyz"} NR%2{temp=$0;next;} {$0=temp$0; gsub(/ /,""); print $1,$2,$3,$4,$5,$6,$7}' OFS=',' file
FIELDWIDTHS="9 8 8 12 13 6 10"
specifies length of the fields to print.NR%2{temp=$0;next;}
store odd line in temp
variable(will be use to join the pair of lines)$0=temp$0
joins each consecutive lines. $0
is current line and temp
is line before current line.gsub(/ /,"");
removes space characters.print $1,$2,$3,$4,$5,$6,$7
prints the seven fields with predefined widths by FIELDWIDTHS
Upvotes: 1
Reputation: 40778
Here is a Perl solution:
use strict;
use warnings;
my @fmt = (9, 8, 8, 12, 13, 6, 10);
my @head = qw(ABC_ID def_sc sde_hd mln_hfg ghi_jkl ijk_klm pqr_xyz);
my $rec_len = do { my $sum; for(@fmt) { $sum += $_ }; $sum };
my $fn = 'file';
open(my $fh, '<', $fn) or die "Could not open file '$fn': $!\n";
my $str = do {local $/ = undef; <$fh>};
close($fh);
$str =~ s/\s*//g;
my $regex = join ("", map { "(.{$_})" } @fmt);
my $head_fmt = join (", ", map { "%-". $_ . "s", } @fmt) . "\n";
printf $head_fmt, @head;
while ( $str =~ /(.{$rec_len})/g ) {
my @f = $1 =~ /$regex/;
print join(", ", @f) . "\n";
}
Output:
ABC_ID , def_sc , sde_hd , mln_hfg , ghi_jkl , ijk_klm, pqr_xyz
4002000W1, ABCDABCD, 78211234, 5671LSN12301, 6300000000000, 009164, 0004091644
002000W1A, BCDABCD7, 82112345, 671LSN123016, 3000000000000, 091640, 0040916440
02000W1AB, CDABCD78, 21123456, 71LSN1230163, 0000000000000, 916400, 0409164400
2000W1ABC, DABCD782, 11234567, 1LSN12301630, 0000000000009, 164000, 4091644002
000W1ABCD, ABCD7821, 12345671, LSN123016300, 0000000000091, 640004, 0091644002
Upvotes: 0
Reputation: 204035
It's not entirely clear what you want but GNU awk for FIELDWIDTHS and multi-char RS is one option:
$ awk -v RS='^$' -v FIELDWIDTHS="9 8 8 8" -v OFS=', ' '{gsub(/\n/,""); print $1, $2, $3, $4}' file
4002000W1, ABCDABCD, 78211234, 56789071
Upvotes: 1