Gislef
Gislef

Reputation: 1637

regex of two patterns

I have a very large csv list, I've already converted a list into array and managed to fix a problem I was having with UTF8:

 $lines = file(''.get_template_directory_uri() . '/lines.csv');      

        foreach ($lines as $line_num => $line)
    {
        if(mb_detect_encoding($line, 'utf-8', false)) {
            $listLines.=  $line . '<br />';     
         }
    } 

But all of the list items follow one of the two patterns below:

Fist

Adolfo (São Paulo)|Adolfo (SP)

Basically I need all content that is before |, output:

Adolfo_(São_Paulo)

second

other items in the list do not have |

Abatiá (PR)    
Abel Figueiredo (PA)
São Francisco de Assis do Piauí (PI)

I need output:

Abatiá
Abel_Figueiredo
São_Francisco_de_Assis_do_Piauí

I believe I'm going to have to use regex, but I'm a bit confused as to make the rule for both situations.

Upvotes: 0

Views: 61

Answers (2)

Arnab Mukherjee
Arnab Mukherjee

Reputation: 21

Check if "|" is present in the string. If it is present then split on the bar and get only the 1st substring. If its not present then split on spaces and get all substrings except the last one.

This should work for your list of data if the elements belong to either of the types mentioned and there are no 3rd type of strings

Upvotes: 1

D.B.
D.B.

Reputation: 1782

Based on comments... how about this:

$lines = file(''.get_template_directory_uri() . '/lines.csv');      

foreach ($lines as $line_num => $line)
{
    if(mb_detect_encoding($line, 'utf-8', false)) {
        $exp = '';
        if(strpos($line, '|')!==FALSE){
            $exp = '/^(.+?)\s*\|/';
        }else{
            $exp = '/^(.+?)\s*\(/';
        }
        preg_match($exp, $line, $matches);
        if($matches){
             $line = $matches[1];
             $line = preg_replace('/\s+/', '_', $line);
             $listLines.=  $line . '<br />';
        }
    }
} 

Upvotes: 1

Related Questions