Reputation: 41
I am trying to split texts into "steps" Lets say my text is
my $steps = "1.Do this. 2.Then do that. 3.And then maybe that. 4.Complete!"
I'd like the output to be:
"1.Do this."
"2.Then do that."
"3.And then maybe that."
"4.Complete!"
I'm not really that good with regex so help would be great!
I've tried many combination like:
split /(\s\d.)/
But it splits the numbering away from text
Upvotes: 1
Views: 89
Reputation: 66924
All step-descriptions start with a number followed by a period and then have non-numbers, until the next number. So capture all such patterns
my @s = $steps =~ / [0-9]+\. [^0-9]+ /xg;
say for @s;
This works only if there are surely no numbers in the steps' description, like any approach relying on matching a number (even if followed by a period, for decimal numbers)†
If there may be numbers in there, we'd need to know more about the structure of the text.
Another delimiting pattern to consider is punctuation that ends a sentence (.
and !
in these examples), if there are no such characters in steps' description and there are no multiple sentences
my @s = $steps =~ / [0-9]+\. .*? [.!] /xg;
Augment the list of patterns that end an item's description as needed, say with a ?
, and/or ."
sequence as punctuation often goes inside quotes.‡
If an item can have multiple sentences, or use end-of-sentence punctuation mid-sentence (as a part of a quotation perhaps) then tighten the condition for an item's end by combining footnotes -- end-of-sentence punctuation and followed by number+period
my @s = $steps =~ /[0-9]+\. .*? (?: \."|\!"|[.\!]) (?=\s+[0-9]+\. | \z)/xg;
If this isn't good enough either then we'd really need a more precise description of that text.
† An approach using a "numbers-period" pattern to delimit item's description, like
/ [0-9]+\. .*? (?=\s+[0-9]+\. | \z) /xg;
(or in a lookahead in split
) fails with text like
1. Only $2.50
or 1. Version 2.4.1
...
‡ To include text like 1. Do "this."
and 2. Or "that!"
we'd want
/ [0-9]+\. .*? (?: \." | !" | [.!?]) /xg;
Upvotes: 3
Reputation: 6808
Following sample code demonstrates power of regex to fill up %steps
hash in one line of code.
Once the data obtained you can dice and slice it anyway your heart desires.
Inspect the sample for compliance with your problem.
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my($str,%steps,$re);
$str = '1.Do this. 2.Then do that. 3.And then maybe that. 4.Complete!';
$re = qr/(\d+)\.(\D+)\./;
%steps = $str =~ /$re/g;
say Dumper(\%steps);
say "$_. $steps{$_}" for sort keys %steps;
Output
$VAR1 = {
'1' => 'Do this',
'2' => 'Then do that',
'3' => 'And then maybe that'
};
1. Do this
2. Then do that
3. And then maybe that
Upvotes: 0
Reputation: 386426
I would indeed use split
. But you need to exclude the digit from the match by using a lookahead.
my @steps = split /\s+(?=\d+\.)/, $steps;
Upvotes: 4