Reputation: 309
I have a record here with multiple lines, what I what to do is sort them according to type and the 6digits number in the line HEADER1.
Here is the record:
HEADER1|TYPE1|123456|JOHN SMITH
INFO|M|34|SINGLE
INFO|SGT
STATUS|KIA
MSG|NONE
HEADER1|TYPE3|654123|DANICA CLYNE
INFO|F|20|SINGLE
STATUS|MIA
MSG|HELP
MSG1||
HEADER1|TYPE2|987456|NIDALEE LANE
INFO|F|26|MARRIED
STATUS|INJURED
MSG|NONE
HEADER1|TYPE1|123456|JOHN CONNOR
INFO|M|34|SINGLE
STATUS|KIA
MSG|NONE
HEADER1|TYPE4|123789|CAITLYN MIST
INFO|F|19|SINGLE
INFO|||
STATUS|NONE
MSG|NONE
HEADER1|TYPE2|987456|NIDALEE CROSS
INFO|F|26|MARRIED
STATUS|INJURED
MSG|NONE
The Output should be like this: it sorted the line that matched to the rule
HEADER1|TYPE1|123456|JOHN SMITH
INFO|M|34|SINGLE
INFO|SGT
STATUS|KIA
MSG|NONE
HEADER1|TYPE1|123456|JOHN CONNOR
INFO|M|34|SINGLE
STATUS|KIA
MSG|NONE
HEADER1|TYPE2|987456|NIDALEE LANE
INFO|F|26|MARRIED
STATUS|INJURED
MSG|NONE
HEADER1|TYPE2|987456|NIDALEE CROSS
INFO|F|26|MARRIED
STATUS|INJURED
MSG|NONE
HEADER1|TYPE3|654123|DANICA CLYNE
INFO|F|20|SINGLE
STATUS|MIA
MSG|HELP
MSG1||
HEADER1|TYPE4|123789|CAITLYN MIST
INFO|F|19|SINGLE
INFO|||
STATUS|NONE
MSG|NONE
Upvotes: -2
Views: 282
Reputation: 6598
Using List::MoreUtils 'apply' and setting the input_record_separator to 'HEADER', the code could be like below.
#!/usr/bin/perl
use strict;
use warnings;
use List::MoreUtils qw/ apply /;
my $fname = 'dup_data.txt';
open (my $input_fh, '<', $fname) or die "Unable to read '$fname' because $!";
open (my $OUTPUTA, ">", $fname .".reformat")
or die "$0: could not write to '$fname.reformat'. $!";
{
local $/ = "HEADER";
print $OUTPUTA map{ "HEADER$_->[0]"}
sort {$a->[1] <=> $b->[1] || $a->[2] <=> $b->[2]}
map {[$_, /TYPE(\d+)\|(\d+)/]}
grep $_, apply {chomp} <$input_fh>;
}
close $input_fh or die $!;
close $OUTPUTA or die $!;
Upvotes: 1
Reputation: 5139
Here's my solution.
#!/bin/perl
use warnings;
use strict;
# Read in the file
open(my $fh, '<', "./record.txt") or DIE $!;
my @lines = <$fh>;
my @records;
# populate @records with each element having 4 lines
for ( my $index = 0; $index < scalar @lines; $index+=4 ) {
push @records, join("", ($lines[$index], $lines[$index+1], $lines[$index+2], $lines[$index+3]));
}
# sort by type and then by numbers
@records = map { $_->[0] }
sort { $a->[1] cmp $b->[1] || $a->[2] cmp $b->[2] }
map { [ $_ , (split('\|', $_))[1], (split('\|', $_))[2] ] }
@records;
print "@records";
Here's an updated version, same idea:
#!/bin/perl
use warnings;
use strict;
open(my $fh, '<', "./record.txt") or DIE $!;
my @lines = <$fh>;
my $temp = join ("", @lines);
my @records = split("HEADER1", "$temp");
my @new_records;
for my $rec (@records){
push @new_records, "HEADER1" . $rec;
}
shift @new_records;
@records = map { $_->[0] }
sort { $a->[1] cmp $b->[1] || $a->[2] cmp $b->[2] }
map { [ $_ , (split('\|', $_))[1], (split('\|', $_))[2] ] }
@new_records;
print "@records";
Upvotes: 2
Reputation: 129413
If you don't care about performance, and every "record" consists of 4 lines:
# Assume STDIN since the question didn't say anything
my $line_index = 0;
my (@records, @record);
# Slurp in all records into array of quadruplets
while (<>) {
if (0 == $line_index) {
push @records, [];
};
$records[-1]->[$line_index] = $_; # -1 lets you access last element of array.
$line_index++;
$line_index = 0 if $line_index == 4; # better done via "%"
}
# Sort the array. Since we sort by type+id,
# we can simply sort the first strings alphabetically.
my @records_sorted = sort { $a->[0] cmp $b->[0] } @records;
foreach my $record (@records_sorted) {
print join("", @$record); # Newlines never stripped, no need to append
}
If you're more adventurous, use List::MoreUtils::natatime:
use List::MoreUtils q/natatime/;
my @lines = File::Slurp::read_file("my_file.txt");
my $it = natatime 4, @lines;
my (@records, @record);
while ((@record) = $it->()) {
push @records, \@record;
}
my @records_sorted = sort { $a->[0] cmp $b->[0] } @records;
foreach my $record (@records_sorted) {
print join("", @$record);
}
Another option for creating @records from @lines is List::Gen
:
use List::Gen qw/by/;
foreach my $record (by 4 => @lines) {
push @records, $record;
}
Please note that the above code assumes that all the #s are 6-digit. If that's not the case, you need to modify the code a bit:
use List::Gen qw/by/;
my @lines = File::Slurp::read_file("my_file.txt");
my @records;
foreach my $record (by 4 => @lines) {
my @sort_by = split(m#/#, $record->[0]);
push @records, [ $record, \@sort_by ];
}
my @records_sorted = sort {
$a->[1]->[1] cmp $b->[1]->[1]
|| $a->[1]->[2] <=> $b->[1]->[1]
} @records;
foreach my $record (@records_sorted) {
print join("", @{$record->[0]});
}
UPDATE: Since the OP decided that input file may have ANY # of lines per record, here's the updated code:
my (@records, @record);
# Slurp in all records into array of quadruplets
while (<>) {
if (/HEADER1/) {
my @sort_by = split(m#/#);
push @records, [[], \@sort_by];
};
push @{ $records[-1]->[0] }, $_;
}
my @records_sorted = sort {
$a->[1]->[1] cmp $b->[1]->[1]
|| $a->[1]->[2] <=> $b->[1]->[1]
} @records;
foreach my $record (@records_sorted) {
print join("", @{$record->[0]});
}
Upvotes: 2