user229691
user229691

Reputation: 53

Convert a fixed length file to csv file

How can i :

  1. Convert a fixed length file to a csv file.

  2. Split records from input file (fixed length file) as per the column length.

I have tried to convert the file using 'awk' however, the result is incorrect due to the spaces in records.

Input file:

4002000W1ABCDABCD7821 12345671LSN12301630 00000000000091640
00409164
4002000W1ABCDABCD7821 12345671LSN12301630 00000000000091640
00409164
4002000W1ABCDABCD7821 12345671LSN12301630 00000000000091640
00409164
4002000W1ABCDABCD7821 12345671LSN12301630 00000000000091640
00409164
4002000W1ABCDABCD7821 12345671LSN12301630 00000000000091640
004009164
4002000W1ABCDABCD7821 12345671LSN12301630 00000000000091640
004009164

The first record starts from 4002000W1ABCDABCD7821 and ends at 00409164

There are 6 records in total.

Input file contains 6 records for the table.

There are more than 40 columns in the record, I have mentioned only a few of them.

Columns are fixed length as follows:

ABC_ID(9), def_sc(8), sde_hd(8),mln_hfg(12), ghi_jkl(13),ijk_klm(6),pqr_xyz(10)

Expected output is as follows:

Output file:

ABC_ID,     def_sc, sde_hd,   mln_hfg,        ghi_jkl,     ijk_klm,  pqr_xyz
4002000W1, ABCDABCD,78211234, 56702291LSN1, 2301630000000, 000916, 4000409164
4002000W1, ABCDABCD,78211234, 56702291LSN1, 2301630000000, 000916, 4000409164
4002000W1, ABCDABCD,78211234, 56702291LSN1, 2301630000000, 000916, 4000409164
4002000W1, ABCDABCD,78211234, 56702291LSN1, 2301630000000, 000916, 4000409164
4002000W1, ABCDABCD,78211234, 56702291LSN1, 2301630000000, 000916, 4000409164
4002000W1, ABCDABCD,78211234, 56702291LSN1, 2301630000000, 000916, 4000409164

Is this possible to achieve using sed command.

Please suggest.

Upvotes: 1

Views: 2152

Answers (3)

αғsнιη
αғsнιη

Reputation: 2761

awk -v FIELDWIDTHS="9 8 8 12 13 6 10" 'NR%2{temp=$0;next;} {$0=temp$0; gsub(/ /,""); print $1,$2,$3,$4,$5,$6,$7}' OFS=','  file

Input file:

4002000W1ABCDABCD7821 123456702291LSN1230 16300000000009164
000409164
4002000W1ABCDABCD7821 123456702291LSN1230 16300000000009164
000409164
4002000W1ABCDABCD7821 123456702291LSN1230 16300000000009164
000409164
4002000W1ABCDABCD7821 123456702291LSN1230 16300000000009164
000409164
4002000W1ABCDABCD7821 123456702291LSN1230 16300000000009164
000409164
4002000W1ABCDABCD7821 123456702291LSN1230 16300000000009164
000409164

Out file:

4002000W1,ABCDABCD,78211234,56702291LSN1,2301630000000,000916,4000409164
4002000W1,ABCDABCD,78211234,56702291LSN1,2301630000000,000916,4000409164
4002000W1,ABCDABCD,78211234,56702291LSN1,2301630000000,000916,4000409164
4002000W1,ABCDABCD,78211234,56702291LSN1,2301630000000,000916,4000409164
4002000W1,ABCDABCD,78211234,56702291LSN1,2301630000000,000916,4000409164
4002000W1,ABCDABCD,78211234,56702291LSN1,2301630000000,000916,4000409164

For adding the first row just print it in first of all inside BEGIN{...}:

awk -v FIELDWIDTHS="9 8 8 12 13 6 10" 'BEGIN{print "ABC_ID,    def_sc,   sde_hd,   mln_hfg,        ghi_jkl,     ijk_klm,  pqr_xyz"} NR%2{temp=$0;next;} {$0=temp$0; gsub(/ /,""); print $1,$2,$3,$4,$5,$6,$7}' OFS=','  file

Explanation:

  • FIELDWIDTHS="9 8 8 12 13 6 10" specifies length of the fields to print.
  • NR%2{temp=$0;next;} store odd line in temp variable(will be use to join the pair of lines)
  • $0=temp$0 joins each consecutive lines. $0 is current line and temp is line before current line.
  • gsub(/ /,""); removes space characters.
  • print $1,$2,$3,$4,$5,$6,$7 prints the seven fields with predefined widths by FIELDWIDTHS

Upvotes: 1

Håkon Hægland
Håkon Hægland

Reputation: 40778

Here is a Perl solution:

use strict;
use warnings;

my @fmt = (9, 8, 8, 12, 13, 6, 10);
my @head = qw(ABC_ID def_sc sde_hd mln_hfg ghi_jkl ijk_klm pqr_xyz);

my $rec_len = do { my $sum; for(@fmt) { $sum += $_ }; $sum };

my $fn = 'file';
open(my $fh, '<', $fn) or die "Could not open file '$fn': $!\n";
my $str = do {local $/ = undef; <$fh>};
close($fh);
$str =~ s/\s*//g;
my $regex = join ("", map { "(.{$_})" } @fmt);

my $head_fmt = join (", ", map { "%-". $_ . "s", } @fmt) . "\n";
printf $head_fmt, @head;

while ( $str =~ /(.{$rec_len})/g ) {
    my @f = $1 =~ /$regex/;
    print join(", ", @f) . "\n";
}

Output:

ABC_ID   , def_sc  , sde_hd  , mln_hfg     , ghi_jkl      , ijk_klm, pqr_xyz   
4002000W1, ABCDABCD, 78211234, 5671LSN12301, 6300000000000, 009164, 0004091644
002000W1A, BCDABCD7, 82112345, 671LSN123016, 3000000000000, 091640, 0040916440
02000W1AB, CDABCD78, 21123456, 71LSN1230163, 0000000000000, 916400, 0409164400
2000W1ABC, DABCD782, 11234567, 1LSN12301630, 0000000000009, 164000, 4091644002
000W1ABCD, ABCD7821, 12345671, LSN123016300, 0000000000091, 640004, 0091644002

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 204035

It's not entirely clear what you want but GNU awk for FIELDWIDTHS and multi-char RS is one option:

$ awk -v RS='^$' -v FIELDWIDTHS="9 8 8 8" -v OFS=', ' '{gsub(/\n/,""); print $1, $2, $3, $4}' file
4002000W1, ABCDABCD, 78211234, 56789071

Upvotes: 1

Related Questions