Roman Sobol
Roman Sobol

Reputation: 75

Separation of the name by the surname and initials

There is a list of names and initials, which are separated by a comma and stored in a variable called $author

Shevchuk T.I., Piskun R.P., Vasenko T.B.

It is necessary to separate the initials and surnames separately into variables.

Example of Names:

Belemets N.I. / N.I. Belemets / N. I. Belemets / Belemets N. I. / Belemets N. / N. Belemets / Nu. Belemets / Belemets Nu.

Now I try to do this as follows:

$str_arr1= explode(", ", $author);
$initials= preg_split('([A-Z]\.[A-Z]\.|[A-Z]\.\s+[A-Z]\.|[A-Z][a-z]\.)', $str_arr1);
$surnames= preg_split('\w{3,15}', $str_arr1);

Example of print_r ($str_arr1):

Array
(
    [0] => Gunas I. V.
    [1] => Babych L. V.
    [2] => Cherkasov E. V.
)

But $initials and $surnames do not output anything. What could be the problem? CMS MODX.

Thanks in advance!

UPD:

Now code looks like this:

$str_arr= explode(", ", $author);
foreach($str_arr as $value){
    $preinitial= preg_split('/([A-Z]\.[A-Z]\.|[A-Z]\.\s+[A-Z]\.|[A-Z][a-z]\.\s+[A-Z]\.|[A-Z][a-z]\.)/', $value, -1, PREG_SPLIT_NO_EMPTY);
    $presurname= preg_split('/\w{3,15}/', $value, -1, PREG_SPLIT_NO_EMPTY);
    $initial = implode("", $preinitial);
    $surname = implode("", $presurname);
    echo '<given_name>'.$surname.'</given_name>';
    echo '<surname>'.$initial.'</surname>';
    echo "\r\n";
}

Upvotes: 0

Views: 476

Answers (1)

chris85
chris85

Reputation: 23892

You have a few issues with your implementation. preg_split doesn't take arrays, and requires delimiters. You also should use the PREG_SPLIT_NO_EMPTY so you don't get back empty values. Your variable names also are inverted, the split removes what is matched so $initials is really the surname, and $surnames are really the initials.

$author = 'Shevchuk T.I., Piskun R.P., Vasenko T.B.';
$str_arr1= explode(", ", $author);
foreach($str_arr1 as $str_arr) {
    $initials= preg_split('/([A-Z]\.[A-Z]\.|[A-Z]\.\s+[A-Z]\.|[A-Z][a-z]\.)/', $str_arr, -1, PREG_SPLIT_NO_EMPTY);
    $surnames= preg_split('/\w{3,15}/', $str_arr, -1, PREG_SPLIT_NO_EMPTY);
    print_r($initials);
    print_r($surnames);
}

Demo: https://3v4l.org/1sgmX

I'd recommend this library which I've used successfully to parse full references, https://github.com/knmnyn/ParsCit. You can probably pull out the logic to just parse the authors.

The surname check with 3,15 also won't work in all cases. For example https://www.ncbi.nlm.nih.gov/pubmed/29052443, Hong Yu won't be matched because the surname is only 2 characters.

Upvotes: 2

Related Questions