Reputation: 15
So I have a problem and I can't solve it. If I read some words from a file in Perl, in that file the words aren't in order, but have a number (as a first character) that should be the element's position to form a sentence.The 0 means that position is correct, 1 means that the word should be in position [1]
etc.
The file looks like: 0This 3a 4sentence 2be 1should
, and the solution should look like 0This 1should 2be 3a 4sentence.
In a for
loop I get through the words array that i get from the file, and this is how i get the first character(the number) $firstCharacter = substr $words[$i], 0, 1;
, but i don't know how to properly change the array.
Here's the code that I use
#!/usr/bin/perl -w
$arg = $ARGV[0];
open FILE, "< $arg" or die "Can't open file: $!\n";
$/ = ".\n";
while($row = <FILE>)
{
chomp $row;
@words = split(' ',$row);
}
for($i = 0; $i < scalar @words; $i++)
{
$firstCharacter = substr $words[$i], 0, 1;
if($firstCharacter != 0)
{
}
}
Upvotes: 1
Views: 128
Reputation: 42099
You should be able to split on the spaces, which will make the numbers the first character of the word. With that assumption, you can simply compare using the numerical comparison operator (<=>
) as opposed to the string comparison (cmp
).
The operators are important because if you compare strings, the first character is used, meaning 10
, 11
, and 12
would be out of order, and listed near the 1
(1,10,11,12,2,3,4…
instead of 1,2,3,4…10,11,12
).
Note: @schwern commented an important point. If you use warnings -- and you should -- you will receive warnings. This is because the values of the internal comparison variables, $a
and $b
, aren't numbers, but strings (e.g., `"0this", "3a"). I've update the following Codepad and provided more suitable alternatives to avoid this issue.
use strict;
use warnings;
my $line = q{0This 3a 4sentence 2be 1should};
my @words = split /\s/,$line;
my @sorted = sort {$a <=> $b} @words;
print qq{
Line: $line
Words: @words
Sorted: @sorted
};
One method is to ignore the warning using no warnings 'numeric'
as in Schwern's answer. As he has shown, turning off the warnings in a block will re-enable it afterwards, which may be a little foolproof compared to Choroba's answer, which applies it to the broader scope.
Choroba's solution works by parsing the digits from the those values internally. This is much fewer lines of code, but I would generally advise against that for performance reasons. The regex isn't only run once per word, but multiple times over the sorting process.
Another method is to strip the numbers out and use them for the sort comparison. I attempt to do this below by creating a hash, where the key will be the number and the value will be the word.
Once you have an array where the values are the words prefixed by the numbers, you could just as easily split those number/word combo into a hash that has the key as the number and value as the word. This is accomplished by using split
.
The important thing to note about the split
statement is that a limit is passed (in this case 2
), which limits the maximum number of fields the string is split into.
The two values are then used in the map
to build the key/value assignment. Thus "0This"
is split into "0"
and "This"
to be used in the hash as "0"=>"This"
use strict;
use warnings;
my $line = q{0This 3a 4sentence 2be 1should};
my @words = split /\s/, $line; # [ '0This', '3a', ... ]
my %mapped = map { split /(?=\D)/, $_, 2 } @words; # { '0'=>'This, '3'=>'a', ... }
my @sorted = @mapped{ sort { $a <=> $b } keys %mapped }; # [ 'This', 'should', 'be', ... ]
print qq{
Line: $line
Words: @words
Sorted: @sorted
};
This also can be further optimized, but uses multiple variables to illustrate the steps in the process.
Upvotes: 1
Reputation: 164689
Assuming you have an array like this:
my @words = ('0This', '3a', '4sentence', '2be', '1should');
And you want it sorted like so:
('0This', '1should', '2be', '3a', '4sentence');
There's two steps to this. First is extracting the leading number. Then sorting by that number.
You can't use substr
, because you don't know how long the number might be. For example, ('9Second', '12345First')
. If you only looked at the first character you'd get 9 and 1 and sort them incorrectly.
Instead, you'd use a regex to capture the number.
my($num) = $word =~ /^(\d+)/;
See perlretut for more on how that works, particularly Extracting Matches.
Now that you can capture the numbers, you can sort by them. Rather than doing it in loop yourself, sort
handles the sorting for you. All you have to do is supply the criterion for the sorting. In this case we capture the number from each word (assigned to $a and $b by sort) and compare them as numbers.
@words = sort {
# Capture the number from each word.
my($anum) = $a =~ /^(\d+)/;
my($bnum) = $b =~ /^(\d+)/;
# Compare the numbers.
$anum <=> $bnum
} @words;
There are various ways to make this more efficient, in particular the Schwartzian Transform.
You can also cheat a bit.
If you ask Perl to treat something as a number, it will do its damnedest to comply. If the string starts with a number, it will use that and ignore the rest, though it will complain.
$ perl -wle 'print "23foo" + "42bar"'
Argument "42bar" isn't numeric in addition (+) at -e line 1.
Argument "23foo" isn't numeric in addition (+) at -e line 1.
65
We can take advantage of that to simplify the sort by just comparing the words as numbers directly.
{
no warnings 'numeric';
@words = sort { $a <=> $b } @words;
}
Note that I turned off the warning about using a word as a number. use warnings
and no warnings
only has effect within the current block, so by putting the no warnings 'numeric'
and the sort
in their own block I've only turned off the warning for that one sort statement.
Finally, if the words are in a file you can use the Unix sort
utility from the command line. Use -n
for "numeric sorting" and it will do the same trick as above.
$ cat test.data
00This
3a
123sentence
2be
1should
$ sort -n test.data
00This
1should
2be
3a
123sentence
Upvotes: 3
Reputation: 241808
Just use sort. You can use a match in list context to extract the numbers, using \d+
will work even for numbers > 9:
#! /usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my @words = qw( 0This 3a 4sentence 2be 1should );
say join ' ', sort { ($a =~ /\d+/g)[0] <=> ($b =~ /\d+/g)[0] } @words;
If you don't mind the warnings, or you are willing to turn them off, you can use numeric comparison directly on the words, Perl will extract the numeric prefixes itself:
no warnings 'numeric';
say join ' ', sort { $a <=> $b } @words;
Upvotes: 7