Vibration Of Life
Vibration Of Life

Reputation: 3237

PHP extract values from a string

I'm processing records in PHP and was wondering if there is an efficient method to pull out the genre: values from each of the following records. genre: can be anywhere in the string.

In the following string I need to pull out the word "alternative" (last word)

[media:keywords] => upc:00602527365589,Records,mercury,artist:Neon 
 Trees,Alternative,trees,neon,genre:alternative

In the following string I need to pull out "Latin / Pop,latino,Pop"

[media:keywords] => genre:Latin / Pop,latino,Pop,upc:00602527341217,artist:Luis 
 Fonsi,luis,universal,Fonsi,Latin

In the following record I need to pull out "other"

[media:keywords] => upc:793018101530,andy,razor,Other,tie,genre:other,artist:Andy 
McKee,McKee,&

In the following record I need to pull out "rock,flotsam,jetsam"

[media:keywords] => and,upc:00602498572061,genre:rock,flotsam,jetsam,artist:Flotsam 
And Jetsam,rock,geffen

I'm pulling my hair out on this (what is left anyway).

Upvotes: 1

Views: 169

Answers (5)

Mariusz Sakowski
Mariusz Sakowski

Reputation: 3280

your problem with parsing this string is that you don't have normal delimiter and/or quotes (i.e. comma separates fields, but may be as well included in a field - it's the same problem that exist with CSV files without quotes).

If performance does not matter a lot for you I would suggest parsing it in more bullet proof way, like make some assumption about what is a key (like artist, genre, ups, etc.) and introduce some normal delimiter, the proof of concept code would be: (i have left echoes so you can see whats happening)

$string = "genre:Latin / Pop,latino,Pop,upc:00602527341217,artist:Luis Fonsi,luis,universal,Fonsi,Latin";
//introduce a delimiter
$delimiter = '|';
$withDelimiter = preg_replace('/([a-z]+):/', $delimiter . '$0', $string);
echo $withDelimiter . "\n";

$fields = explode($delimiter, $withDelimiter);
foreach ($fields as $field) {
    if (strlen($field)) {
        echo $field . "\n";

        list ($key, $valueWithPossiblyTrailingComma) = explode(':', $field);    

        if ($key === 'genre') {
            $genre = rtrim($valueWithPossiblyTrailingComma, ',');
            break;
        }
    }
}
echo $genre;

you can make it work in nearly all cases, and it allows you to find any key not only genre - but it's performance will be low.

I have made following assumptions about your string:

  • it is a list of key => value pairs delimited by colon and concatenated with comma
  • key may have only [a-z] characters

Upvotes: 0

FtDRbwLXw6
FtDRbwLXw6

Reputation: 28891

Use the following regular expression coupled with preg_match():

~\bgenre:(.+?)(?=(,[^:,]+:|$))~

Your desired result will be in the first element of the matches array (paremeter 3).

Upvotes: 2

Sandeep Bansal
Sandeep Bansal

Reputation: 6394

$mystring = 'abc';
$findme   = 'a';
$pos = strpos($mystring, $findme);

// Note our use of ===.  Simply == would not work as expected
// because the position of 'a' was the 0th (first) character.
if ($pos === false) {
    echo "The string '$findme' was not found in the string '$mystring'";
} else {
    echo "The string '$findme' was found in the string '$mystring'";
    echo " and exists at position $pos";
}

From the PHP Documentation for strpos

So you can just use $findme = "alternative"

Upvotes: 0

mario
mario

Reputation: 145482

You can indeed use a bit of pattern detection. You are always looking for the fixed genre: followed by one or more words or phrases, neither of which may itself contain a :

So this might suffice:

preg_match('~\bgenre:(,?[^:,]+(?=,|$))+~', $media_keywords, $match);
print $match[1];

Upvotes: 0

Iggy Van Der Wielen
Iggy Van Der Wielen

Reputation: 124

I shall use a strpos to define where the genre starts. The only problem you have is where to end it because you do not have a delimeter. I should use the known other keywords like "upc","artist" etc to check if the string needs to be cut of at the end.

Upvotes: 0

Related Questions