regex of two patterns

Question

I have a very large csv list, I've already converted a list into array and managed to fix a problem I was having with UTF8:

 $lines = file(''.get_template_directory_uri() . '/lines.csv');      

        foreach ($lines as $line_num => $line)
    {
        if(mb_detect_encoding($line, 'utf-8', false)) {
            $listLines.=  $line . '
';     
         }
    }

But all of the list items follow one of the two patterns below:

Fist

Adolfo (São Paulo)|Adolfo (SP)

Basically I need all content that is before |, output:

Adolfo_(São_Paulo)

second

other items in the list do not have |

Abatiá (PR)    
Abel Figueiredo (PA)
São Francisco de Assis do Piauí (PI)

I need output:

Abatiá
Abel_Figueiredo
São_Francisco_de_Assis_do_Piauí

I believe I'm going to have to use regex, but I'm a bit confused as to make the rule for both situations.

D.B. · Accepted Answer

Based on comments... how about this:

$lines = file(''.get_template_directory_uri() . '/lines.csv');      

foreach ($lines as $line_num => $line)
{
    if(mb_detect_encoding($line, 'utf-8', false)) {
        $exp = '';
        if(strpos($line, '|')!==FALSE){
            $exp = '/^(.+?)\s*\|/';
        }else{
            $exp = '/^(.+?)\s*\(/';
        }
        preg_match($exp, $line, $matches);
        if($matches){
             $line = $matches[1];
             $line = preg_replace('/\s+/', '_', $line);
             $listLines.=  $line . '
';
        }
    }
}

regex of two patterns

Answers (2)

Related Questions