Sky
Sky

Reputation: 121

Extract data from string by perl

There is string "-test aaaa -machine bbb -from ccc"

How to extract "aaaa", "bbb", "ccc" using regular?

Even string is "-from   ccc   -test    aaaa    -machine bbb"
(Different order, several space....)

I had tried some code, but always got invalid data.

$str = "-test aaaa     -machine  bbb  -from ccc";
$str =~ /-test\s*(.*)\s*/;

print

aaaa   -machine  bbb  -from ccc

I also want to handle the below case

-test aa_aa -machine aab-baa-aba -from ccc

Upvotes: 2

Views: 410

Answers (4)

Breakpoint404
Breakpoint404

Reputation: 1

This should do the trick

$str = "-test aa_aa     -machine  aab-baa-aba  -from ccc";
($test,$machine,$from) = $str =~ /\-test(.+)\-machine(.+)\-from(.+)/;

print "Test: $test, Machine: $machine, From: $from";

Upvotes: -2

Dave Cross
Dave Cross

Reputation: 69224

I'm going to answer the question that (I think) underlies your question - not the question that you asked.

It looks to me like you are parsing command-line options. So use a command-line option parser, rather than reinventing that for yourself. Getopt::Long is part of the standard Perl distribution.

#!/usr/bin/perl

use strict;
use warnings;
# We use modern Perl (here, specifically, say())
use 5.010;

use Getopt::Long 'GetOptionsFromString';
use Data::Dumper;

my %options;

my $str = '-test aa_aa -machine aab-baa-aba -from ccc';
GetOptionsFromString($str, \%options, 'test=s', 'machine=s', 'from=s');

say Dumper \%options;

Normally, you'd use the function GetOptions() as you're parsing the command-line options that are available in @ARGV. I'm not sure how the options ended up in your string, but there's a useful GetOptionsFromString() function for this situation.

Update: To explain why your code didn't work.

$str = "-test aa_aa     -machine  aab-baa-aba  -from ccc";
$str =~ /-test\s*(.*)\s*/;

You're capturing what matches (.*). But .* is greedy. That is, it matches as much data as it can. And, in this case, that means it matches until the end of the line. There are (at least!) a couple of ways to fix this.

1/ Make the match non-greedy by adding ?.

$str =~ /-test\s*(.*?)\s*/;

2/ Be more explicit about what you're looking for - in this case non-whitespace characters.

$str =~ /-test\s*(\S*)\s*/;

Upvotes: 6

yoniyes
yoniyes

Reputation: 1020

You don't have to use a regex, you can use a hash for that.

use strict;
use warnings;
use Data::Dumper;

my $str = '-test aaaa   -machine  bbb  -from ccc';
my %field = split ' ', $str;
print Dumper(\%field);

The output:

$VAR1 = {
          '-from' => 'ccc',
          '-machine' => 'bbb',
          '-test' => 'aaaa'
        };

No matter what the order is, the split returns an array of pairs (in the shape [word1, word2, word3, word4, word5, word6] and word1, word3, word5 will be -field_name) that when assigned to a hash, creates it in the way that now, if you want to get the string after -test for example, you just access it by typing $field{"-test"} and do whatever you want with it.

EDIT: It doesn't even matter how many spaces you have in between the words or what characters are in the words. It works the same way for all cases as long as you keep it in the format -some_field something -another_field another_thing ...

Upvotes: 7

Chankey Pathak
Chankey Pathak

Reputation: 21666

my @matches;
my $regex = qr/-\w+\s+([\w-]+)/;

my $string = q{-test aaaa -machine bbb -from ccc};
@matches = $string =~ /$regex/g;
print "Matches for first string are: @matches\n";

my $other_string = q{-from   ccc   -test    aaaa    -machine bbb};
@matches = $other_string =~ /$regex/g;
print "Matches for second string are: @matches\n";

my $third_string = q{-test aa_aa -machine aab-baa-aba -from ccc};
@matches = $third_string =~ /$regex/g;

print "Matches for third string are: @matches";

Upvotes: 1

Related Questions