Adrian
Adrian

Reputation: 2291

Getting values after a specific character and stopping on other specific characters regex

I am trying to get values after a specific charachter and stopping on other specific charchters. This is what I tried

$whois = 'Registrant Name: Domain Administrator Registrant Organization: Yahoo! Inc. Registrant Street: 701 First Avenue Registrant City: Sunnyvale';
$data = preg_match_all('/:\s(.*?)\s/', $whois, $data_whois);
var_dump($data_whois[1]);

The whois is for yahoo: http://whois.domaintools.com/yahoo.com

CURRENT OUTPUT

  1 => string 'Domain' (length=6)
  2 => string 'Yahoo!' (length=6)
  3 => string '701' (length=3)
  4 => string 'Sunnyvale' (length=9)

EXPECTED OUTPUT

  1 => string 'Domain Administrator' (length=6)
  2 => string 'Yahoo! Inc.' (length=6)
  3 => string '701 First Avenue' (length=3)
  4 => string 'Sunnyvale' (length=9)

But it's taking only the first word. I believe that is because (.*?)\s I also tried (.*?\s.*?)\sand it's taking the second word, but if the value doesn't have a second word is going to take the word Registrant so I kind of need to stop on Registrant but don't understand exactly how.

Upvotes: 0

Views: 38

Answers (2)

Avinash Raj
Avinash Raj

Reputation: 174696

It seems like your fields has exactly two words followed by : on the second word. If yes then you may try the below regex.

: \K.*?(?= \S+ \S+:|$)

DEMO

PHP code would be,

<?php
$data = 'Registrant Name: Domain Administrator Registrant Organization: Yahoo! Inc. Registrant Street: 701 First Avenue Registrant City: Sunnyvale';
$regex =  '~: \K.*?(?= \S+ \S+:|$)~';
preg_match_all($regex, $data, $matches);
print_r($matches);
?>

Output:

Array
(
    [0] => Array
        (
            [0] => Domain Administrator
            [1] => Yahoo! Inc.
            [2] => 701 First Avenue
            [3] => Sunnyvale
        )

)

Upvotes: 1

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

Since you are using a lazy quantifier .*? followed by a \s the match will stop at the first whitespace character.

A way to solve the problem is to use the fact that .*? must be followed by a space and the word "Registrant" or the end of the string:

/:\s(.*?)(?:\sRegistrant\b|\s*$)/

An other possible way is to use preg_split:

$str = 'Registrant Name: Domain Administrator Registrant Organization: Yahoo! Inc. Registrant Street: 701 First Avenue Registrant City: Sunnyvale';

$pattern = '~\s*\bRegistrant[^:]+:\s*~';
$result = preg_split($pattern, $str, -1, PREG_SPLIT_NO_EMPTY);

Upvotes: 2

Related Questions